11591337-basic-network-troubleshooting

63
Basic Network Troubleshooting. If a computer is unable to connect to a network or see other computers on a network, it may be necessary to troubleshoot the network. A network may not work because of any of the below reasons. 1. Network card not connected properly. 2. Bad network card drivers or software settings. 3. Firewall preventing computers from seeing each other. 4. Connection related issues. 5. Bad network hardware. Basic network troubleshooting. Issue: Basic network troubleshooting. Cause: If a computer is unable to connect to a network or see other computers on a network, it may be necessary to troubleshoot the network. A network may not work because of any of the below reasons. 6. Network card not connected properly. 7. Bad network card drivers or software settings. 8. Firewall preventing computers from seeing each other. 9. Connection related issues. 10. Bad network hardware. Solution: Because of the large variety of network configurations, operating systems, setup, etc... not all of the below information may apply to your network or operating system. If your computer is connected to a company or large network, or you are not the administrator of the network, it is recommended that if you are unable to resolve your issues after following the below recommendations that you contact the network administrator or company representative. Note: If you are being prompted for a Network password and do not know the password, Computer Hope is unable to assist users with obtaining a new or finding out the old password. 63 1

Upload: jeric-salomon

Post on 30-Mar-2015

745 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11591337-Basic-Network-Troubleshooting

Basic Network Troubleshooting

If a computer is unable to connect to a network or see other computers on a network it may be necessary to troubleshoot the network A network may not work because of any of the below reasons

1 Network card not connected properly 2 Bad network card drivers or software settings 3 Firewall preventing computers from seeing each other 4 Connection related issues 5 Bad network hardware

Basic network troubleshooting

Issue

Basic network troubleshooting

Cause

If a computer is unable to connect to a network or see other computers on a network it may be necessary to troubleshoot the network A network may not work because of any of the below reasons

6 Network card not connected properly 7 Bad network card drivers or software settings 8 Firewall preventing computers from seeing each other 9 Connection related issues 10 Bad network hardware

Solution

Because of the large variety of network configurations operating systems setup etc not all of the below information may apply to your network or operating system If your computer is connected to a company or large network or you are not the administrator of the network it is recommended that if you are unable to resolve your issues after following the below recommendations that you contact the network administrator or company representative

Note If you are being prompted for a Network password and do not know the password Computer Hope is unable to assist users with obtaining a new or finding out the old password

63 1

Verify connections LEDs

Verify that the network cable is properly connected to the back of the computer In addition when checking the connection of the network cable ensure that the LEDs on the network are properly illuminated For example a network card with a solid green LED or light usually indicates that the card is either connected or receiving a signal Note generally when the green light is flashing this is an indication of data being sent or received

If however the card does not have any lights or has orange or red lights it is possible that either the card is bad the card is not connected properly or that the card is not receiving a signal from the network

If you are on a small or local network and have the capability of checking a hub or switch verify that the cables are properly connected and that the hub or switch has power

Adapter resources

Ensure that if this is a new network card being installed into the computer that the cards resources are properly set andor are not conflicting with any hardware in the computer

Users who are using Windows 95 98 ME 2000 or XP verify that Device Manager has no conflicts or errors Additional help and information about Device Manager and resources can be found on our Device Manager page

Adapter functionality

Verify that the network card is capable of pinging or seeing itself by using the ping command Windows MS-DOS users ping the computer from a MS-DOS prompt Unix Linux variant users ping the computer from the shell

To ping the card or the localhost type either

ping 127001

or

ping localhost

63 2

This should show a listing of replies from the network card If you receive an error or if the transmission failed it is likely that either the network card is not physically installed into the computer correctly or that the card is bad

Protocol

Verify that the correct protocols are installed on the computer Most networks today will utilize TCPIP but may also utilize or require IPXSPX and NetBEUI

Additional information and help with installing and reinstalling a network protocol can be found on document CH000470

When the TCPIP protocol is installed unless a DNS server or other computer assigns the IPX address the user must specify an IP address as well as a Subnet Mask To do this follow the below instructions

1 Click Start Settings Control Panel 2 Double-click the Network icon 3 Within the configuration tab double-click the TCPIP protocol icon Note Do

not click on the PPP or Dial-Up adapter click on the network card adapter 4 In the TCPIP properties click the IP address tab 5 Select the option to specify an IP address 6 Enter the IP address and Subnet Mask address an example of such an address

could be

IP Address 10255921Subnet Mask 255255255192

7 When specifying these values the computers on the network must all have the same Subnet Mask and have a different IP Address For example when using the above values on one computer you would want to use an IP address of 10255922 on another computer and then specify the same Subnet Mask

Firewall

If your computer network utilizes a firewall ensure that all ports required are open If possible close the firewall software program or disconnect the computer from the firewall to ensure it is not causing the problem

Additional time

In some cases it may take a computer some additional time to detect or see the network If after booting the computer you are unable to see the network give the

63 3

computer 2-3 minutes to detect the network Windows users may also want to try pressing the F5 (refresh) key when in Network Neighborhood to refresh the network connections and possibly detect the network

Additional troubleshooting

If after following or verifying the above recommendations you are still unable to connect or see the network attempt one or more of the below recommendations

If you have installed or are using TCPIP as your protocol you can attempt to ping another computers IP address to verify if the computer is able to send and receive data To do this Windows or MS-DOS users must be at a prompt and Linux Unix variant users must open or be at a shell

Once at the prompt assuming that the address of the computer you wish to attempt to ping is 10255922 you would type

ping 10255922

If you receive a response back from this address (and it is a different computer) this demonstrates that the computer is communicating over the network If you are still unable to connect or see the network it is possible that other issues may be present

Another method of determining network issues is to use the tracert command if you are a MS-DOS or Windows user or the traceroute command if you are a Linux Unix variant user To use this command you must be at the command prompt or shell

Once at the prompt assuming that the address is again 10255922 type

tracert 10255922

or

traceroute 10255922

This should begin listing the hops between the computer and network devices When the connection fails determine which device is causing the issue by reviewing the traceroute listing

Network Troubleshooting Overview

These sections introduce you to the concepts and practice of network troubleshooting

bull Introduction to Network Troubleshooting

63 4

bull Network Troubleshooting Framework bull Troubleshooting Strategy

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally As a network administrator your primary concern is maintaining connectivity of all devices (a process often called fault management) You also continually evaluate and improve your networks performance Because serious networking problems can sometimes begin as performance problems paying attention to performance can help you address issues before they become serious

About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN) Using management tools you can often fix a connectivity problem before users even notice it Connectivity problems include

bull Loss of connectivity - When users cannot access areas of your network your organizations effectiveness is impaired Immediately correct any connectivity breaks

bull Intermittent connectivity - Although users have access to network resources some of the time they are still facing periods of downtime Intermittent connectivity problems can indicate that your network is on the verge of a major break If connectivity is erratic investigate the problem immediately

bull Timeout problems - Timeouts cause loss of connectivity but are often associated with poor network performance

About Performance Problems

Your network has performance problems when it is not operating as effectively as it should For example response times may be slow the network may not be as reliable as usual and users may be complaining that it takes them longer to do their work Some performance problems are intermittent such as instances of duplicate addresses Other problems can indicate a growing strain on your network such as consistently high utilization rates

If you regularly examine your network for performance problems you can extend the usefulness of your existing network configuration and plan network enhancements instead of waiting for a performance problem to adversely affect the users productivity

63 5

Solving Connectivity and Performance Problems

When you troubleshoot your network you employ tools and knowledge already at your disposal With an in-depth understanding of your network you can use network software tools such as Ping and network devices such as Analyzers to locate problems and then make corrections such as swapping equipment or reconfiguring segments based on your analysis

Transcendreg provides another set of tools for network troubleshooting These tools have graphical user interfaces that make managing and troubleshooting your network easier With Transcend Applications you can

bull Baseline your networks normal status to use as a basis for comparison when the network operates abnormally

bull Precisely monitor network events bull Be notified immediately of critical problems on your network such as a

device losing connectivity bull Establish alert thresholds to warn you of potential problems that you can

correct before they affect your network bull Resolve problems by disabling ports or reconfiguring devices

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications This seven-layer structure provides a clear picture of how network communications work

Protocols (rules) govern communications between the layers of a single system and among several systems In this way devices made by different manufacturers or using different designs can use different protocols and still communicate

By understanding how network troubleshooting fits into the framework of the OSI model you can identify at what layer problems are located and which type of troubleshooting tools to use For example unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration If you are receiving high rates of FCS Errors and Alignment Errors which you can monitor with Status Watch then the problem is probably located at the physical layer and not the network layer Figure 1 shows how to troubleshoot the layers of the OSI model

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers

63 6

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 2: 11591337-Basic-Network-Troubleshooting

Verify connections LEDs

Verify that the network cable is properly connected to the back of the computer In addition when checking the connection of the network cable ensure that the LEDs on the network are properly illuminated For example a network card with a solid green LED or light usually indicates that the card is either connected or receiving a signal Note generally when the green light is flashing this is an indication of data being sent or received

If however the card does not have any lights or has orange or red lights it is possible that either the card is bad the card is not connected properly or that the card is not receiving a signal from the network

If you are on a small or local network and have the capability of checking a hub or switch verify that the cables are properly connected and that the hub or switch has power

Adapter resources

Ensure that if this is a new network card being installed into the computer that the cards resources are properly set andor are not conflicting with any hardware in the computer

Users who are using Windows 95 98 ME 2000 or XP verify that Device Manager has no conflicts or errors Additional help and information about Device Manager and resources can be found on our Device Manager page

Adapter functionality

Verify that the network card is capable of pinging or seeing itself by using the ping command Windows MS-DOS users ping the computer from a MS-DOS prompt Unix Linux variant users ping the computer from the shell

To ping the card or the localhost type either

ping 127001

or

ping localhost

63 2

This should show a listing of replies from the network card If you receive an error or if the transmission failed it is likely that either the network card is not physically installed into the computer correctly or that the card is bad

Protocol

Verify that the correct protocols are installed on the computer Most networks today will utilize TCPIP but may also utilize or require IPXSPX and NetBEUI

Additional information and help with installing and reinstalling a network protocol can be found on document CH000470

When the TCPIP protocol is installed unless a DNS server or other computer assigns the IPX address the user must specify an IP address as well as a Subnet Mask To do this follow the below instructions

1 Click Start Settings Control Panel 2 Double-click the Network icon 3 Within the configuration tab double-click the TCPIP protocol icon Note Do

not click on the PPP or Dial-Up adapter click on the network card adapter 4 In the TCPIP properties click the IP address tab 5 Select the option to specify an IP address 6 Enter the IP address and Subnet Mask address an example of such an address

could be

IP Address 10255921Subnet Mask 255255255192

7 When specifying these values the computers on the network must all have the same Subnet Mask and have a different IP Address For example when using the above values on one computer you would want to use an IP address of 10255922 on another computer and then specify the same Subnet Mask

Firewall

If your computer network utilizes a firewall ensure that all ports required are open If possible close the firewall software program or disconnect the computer from the firewall to ensure it is not causing the problem

Additional time

In some cases it may take a computer some additional time to detect or see the network If after booting the computer you are unable to see the network give the

63 3

computer 2-3 minutes to detect the network Windows users may also want to try pressing the F5 (refresh) key when in Network Neighborhood to refresh the network connections and possibly detect the network

Additional troubleshooting

If after following or verifying the above recommendations you are still unable to connect or see the network attempt one or more of the below recommendations

If you have installed or are using TCPIP as your protocol you can attempt to ping another computers IP address to verify if the computer is able to send and receive data To do this Windows or MS-DOS users must be at a prompt and Linux Unix variant users must open or be at a shell

Once at the prompt assuming that the address of the computer you wish to attempt to ping is 10255922 you would type

ping 10255922

If you receive a response back from this address (and it is a different computer) this demonstrates that the computer is communicating over the network If you are still unable to connect or see the network it is possible that other issues may be present

Another method of determining network issues is to use the tracert command if you are a MS-DOS or Windows user or the traceroute command if you are a Linux Unix variant user To use this command you must be at the command prompt or shell

Once at the prompt assuming that the address is again 10255922 type

tracert 10255922

or

traceroute 10255922

This should begin listing the hops between the computer and network devices When the connection fails determine which device is causing the issue by reviewing the traceroute listing

Network Troubleshooting Overview

These sections introduce you to the concepts and practice of network troubleshooting

bull Introduction to Network Troubleshooting

63 4

bull Network Troubleshooting Framework bull Troubleshooting Strategy

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally As a network administrator your primary concern is maintaining connectivity of all devices (a process often called fault management) You also continually evaluate and improve your networks performance Because serious networking problems can sometimes begin as performance problems paying attention to performance can help you address issues before they become serious

About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN) Using management tools you can often fix a connectivity problem before users even notice it Connectivity problems include

bull Loss of connectivity - When users cannot access areas of your network your organizations effectiveness is impaired Immediately correct any connectivity breaks

bull Intermittent connectivity - Although users have access to network resources some of the time they are still facing periods of downtime Intermittent connectivity problems can indicate that your network is on the verge of a major break If connectivity is erratic investigate the problem immediately

bull Timeout problems - Timeouts cause loss of connectivity but are often associated with poor network performance

About Performance Problems

Your network has performance problems when it is not operating as effectively as it should For example response times may be slow the network may not be as reliable as usual and users may be complaining that it takes them longer to do their work Some performance problems are intermittent such as instances of duplicate addresses Other problems can indicate a growing strain on your network such as consistently high utilization rates

If you regularly examine your network for performance problems you can extend the usefulness of your existing network configuration and plan network enhancements instead of waiting for a performance problem to adversely affect the users productivity

63 5

Solving Connectivity and Performance Problems

When you troubleshoot your network you employ tools and knowledge already at your disposal With an in-depth understanding of your network you can use network software tools such as Ping and network devices such as Analyzers to locate problems and then make corrections such as swapping equipment or reconfiguring segments based on your analysis

Transcendreg provides another set of tools for network troubleshooting These tools have graphical user interfaces that make managing and troubleshooting your network easier With Transcend Applications you can

bull Baseline your networks normal status to use as a basis for comparison when the network operates abnormally

bull Precisely monitor network events bull Be notified immediately of critical problems on your network such as a

device losing connectivity bull Establish alert thresholds to warn you of potential problems that you can

correct before they affect your network bull Resolve problems by disabling ports or reconfiguring devices

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications This seven-layer structure provides a clear picture of how network communications work

Protocols (rules) govern communications between the layers of a single system and among several systems In this way devices made by different manufacturers or using different designs can use different protocols and still communicate

By understanding how network troubleshooting fits into the framework of the OSI model you can identify at what layer problems are located and which type of troubleshooting tools to use For example unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration If you are receiving high rates of FCS Errors and Alignment Errors which you can monitor with Status Watch then the problem is probably located at the physical layer and not the network layer Figure 1 shows how to troubleshoot the layers of the OSI model

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers

63 6

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 3: 11591337-Basic-Network-Troubleshooting

This should show a listing of replies from the network card If you receive an error or if the transmission failed it is likely that either the network card is not physically installed into the computer correctly or that the card is bad

Protocol

Verify that the correct protocols are installed on the computer Most networks today will utilize TCPIP but may also utilize or require IPXSPX and NetBEUI

Additional information and help with installing and reinstalling a network protocol can be found on document CH000470

When the TCPIP protocol is installed unless a DNS server or other computer assigns the IPX address the user must specify an IP address as well as a Subnet Mask To do this follow the below instructions

1 Click Start Settings Control Panel 2 Double-click the Network icon 3 Within the configuration tab double-click the TCPIP protocol icon Note Do

not click on the PPP or Dial-Up adapter click on the network card adapter 4 In the TCPIP properties click the IP address tab 5 Select the option to specify an IP address 6 Enter the IP address and Subnet Mask address an example of such an address

could be

IP Address 10255921Subnet Mask 255255255192

7 When specifying these values the computers on the network must all have the same Subnet Mask and have a different IP Address For example when using the above values on one computer you would want to use an IP address of 10255922 on another computer and then specify the same Subnet Mask

Firewall

If your computer network utilizes a firewall ensure that all ports required are open If possible close the firewall software program or disconnect the computer from the firewall to ensure it is not causing the problem

Additional time

In some cases it may take a computer some additional time to detect or see the network If after booting the computer you are unable to see the network give the

63 3

computer 2-3 minutes to detect the network Windows users may also want to try pressing the F5 (refresh) key when in Network Neighborhood to refresh the network connections and possibly detect the network

Additional troubleshooting

If after following or verifying the above recommendations you are still unable to connect or see the network attempt one or more of the below recommendations

If you have installed or are using TCPIP as your protocol you can attempt to ping another computers IP address to verify if the computer is able to send and receive data To do this Windows or MS-DOS users must be at a prompt and Linux Unix variant users must open or be at a shell

Once at the prompt assuming that the address of the computer you wish to attempt to ping is 10255922 you would type

ping 10255922

If you receive a response back from this address (and it is a different computer) this demonstrates that the computer is communicating over the network If you are still unable to connect or see the network it is possible that other issues may be present

Another method of determining network issues is to use the tracert command if you are a MS-DOS or Windows user or the traceroute command if you are a Linux Unix variant user To use this command you must be at the command prompt or shell

Once at the prompt assuming that the address is again 10255922 type

tracert 10255922

or

traceroute 10255922

This should begin listing the hops between the computer and network devices When the connection fails determine which device is causing the issue by reviewing the traceroute listing

Network Troubleshooting Overview

These sections introduce you to the concepts and practice of network troubleshooting

bull Introduction to Network Troubleshooting

63 4

bull Network Troubleshooting Framework bull Troubleshooting Strategy

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally As a network administrator your primary concern is maintaining connectivity of all devices (a process often called fault management) You also continually evaluate and improve your networks performance Because serious networking problems can sometimes begin as performance problems paying attention to performance can help you address issues before they become serious

About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN) Using management tools you can often fix a connectivity problem before users even notice it Connectivity problems include

bull Loss of connectivity - When users cannot access areas of your network your organizations effectiveness is impaired Immediately correct any connectivity breaks

bull Intermittent connectivity - Although users have access to network resources some of the time they are still facing periods of downtime Intermittent connectivity problems can indicate that your network is on the verge of a major break If connectivity is erratic investigate the problem immediately

bull Timeout problems - Timeouts cause loss of connectivity but are often associated with poor network performance

About Performance Problems

Your network has performance problems when it is not operating as effectively as it should For example response times may be slow the network may not be as reliable as usual and users may be complaining that it takes them longer to do their work Some performance problems are intermittent such as instances of duplicate addresses Other problems can indicate a growing strain on your network such as consistently high utilization rates

If you regularly examine your network for performance problems you can extend the usefulness of your existing network configuration and plan network enhancements instead of waiting for a performance problem to adversely affect the users productivity

63 5

Solving Connectivity and Performance Problems

When you troubleshoot your network you employ tools and knowledge already at your disposal With an in-depth understanding of your network you can use network software tools such as Ping and network devices such as Analyzers to locate problems and then make corrections such as swapping equipment or reconfiguring segments based on your analysis

Transcendreg provides another set of tools for network troubleshooting These tools have graphical user interfaces that make managing and troubleshooting your network easier With Transcend Applications you can

bull Baseline your networks normal status to use as a basis for comparison when the network operates abnormally

bull Precisely monitor network events bull Be notified immediately of critical problems on your network such as a

device losing connectivity bull Establish alert thresholds to warn you of potential problems that you can

correct before they affect your network bull Resolve problems by disabling ports or reconfiguring devices

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications This seven-layer structure provides a clear picture of how network communications work

Protocols (rules) govern communications between the layers of a single system and among several systems In this way devices made by different manufacturers or using different designs can use different protocols and still communicate

By understanding how network troubleshooting fits into the framework of the OSI model you can identify at what layer problems are located and which type of troubleshooting tools to use For example unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration If you are receiving high rates of FCS Errors and Alignment Errors which you can monitor with Status Watch then the problem is probably located at the physical layer and not the network layer Figure 1 shows how to troubleshoot the layers of the OSI model

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers

63 6

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 4: 11591337-Basic-Network-Troubleshooting

computer 2-3 minutes to detect the network Windows users may also want to try pressing the F5 (refresh) key when in Network Neighborhood to refresh the network connections and possibly detect the network

Additional troubleshooting

If after following or verifying the above recommendations you are still unable to connect or see the network attempt one or more of the below recommendations

If you have installed or are using TCPIP as your protocol you can attempt to ping another computers IP address to verify if the computer is able to send and receive data To do this Windows or MS-DOS users must be at a prompt and Linux Unix variant users must open or be at a shell

Once at the prompt assuming that the address of the computer you wish to attempt to ping is 10255922 you would type

ping 10255922

If you receive a response back from this address (and it is a different computer) this demonstrates that the computer is communicating over the network If you are still unable to connect or see the network it is possible that other issues may be present

Another method of determining network issues is to use the tracert command if you are a MS-DOS or Windows user or the traceroute command if you are a Linux Unix variant user To use this command you must be at the command prompt or shell

Once at the prompt assuming that the address is again 10255922 type

tracert 10255922

or

traceroute 10255922

This should begin listing the hops between the computer and network devices When the connection fails determine which device is causing the issue by reviewing the traceroute listing

Network Troubleshooting Overview

These sections introduce you to the concepts and practice of network troubleshooting

bull Introduction to Network Troubleshooting

63 4

bull Network Troubleshooting Framework bull Troubleshooting Strategy

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally As a network administrator your primary concern is maintaining connectivity of all devices (a process often called fault management) You also continually evaluate and improve your networks performance Because serious networking problems can sometimes begin as performance problems paying attention to performance can help you address issues before they become serious

About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN) Using management tools you can often fix a connectivity problem before users even notice it Connectivity problems include

bull Loss of connectivity - When users cannot access areas of your network your organizations effectiveness is impaired Immediately correct any connectivity breaks

bull Intermittent connectivity - Although users have access to network resources some of the time they are still facing periods of downtime Intermittent connectivity problems can indicate that your network is on the verge of a major break If connectivity is erratic investigate the problem immediately

bull Timeout problems - Timeouts cause loss of connectivity but are often associated with poor network performance

About Performance Problems

Your network has performance problems when it is not operating as effectively as it should For example response times may be slow the network may not be as reliable as usual and users may be complaining that it takes them longer to do their work Some performance problems are intermittent such as instances of duplicate addresses Other problems can indicate a growing strain on your network such as consistently high utilization rates

If you regularly examine your network for performance problems you can extend the usefulness of your existing network configuration and plan network enhancements instead of waiting for a performance problem to adversely affect the users productivity

63 5

Solving Connectivity and Performance Problems

When you troubleshoot your network you employ tools and knowledge already at your disposal With an in-depth understanding of your network you can use network software tools such as Ping and network devices such as Analyzers to locate problems and then make corrections such as swapping equipment or reconfiguring segments based on your analysis

Transcendreg provides another set of tools for network troubleshooting These tools have graphical user interfaces that make managing and troubleshooting your network easier With Transcend Applications you can

bull Baseline your networks normal status to use as a basis for comparison when the network operates abnormally

bull Precisely monitor network events bull Be notified immediately of critical problems on your network such as a

device losing connectivity bull Establish alert thresholds to warn you of potential problems that you can

correct before they affect your network bull Resolve problems by disabling ports or reconfiguring devices

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications This seven-layer structure provides a clear picture of how network communications work

Protocols (rules) govern communications between the layers of a single system and among several systems In this way devices made by different manufacturers or using different designs can use different protocols and still communicate

By understanding how network troubleshooting fits into the framework of the OSI model you can identify at what layer problems are located and which type of troubleshooting tools to use For example unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration If you are receiving high rates of FCS Errors and Alignment Errors which you can monitor with Status Watch then the problem is probably located at the physical layer and not the network layer Figure 1 shows how to troubleshoot the layers of the OSI model

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers

63 6

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 5: 11591337-Basic-Network-Troubleshooting

bull Network Troubleshooting Framework bull Troubleshooting Strategy

Network troubleshooting means recognizing and diagnosing networking problems with the goal of keeping your network running optimally As a network administrator your primary concern is maintaining connectivity of all devices (a process often called fault management) You also continually evaluate and improve your networks performance Because serious networking problems can sometimes begin as performance problems paying attention to performance can help you address issues before they become serious

About Connectivity Problems

Connectivity problems occur when end stations cannot communicate with other areas of your local area network (LAN) or wide area network (WAN) Using management tools you can often fix a connectivity problem before users even notice it Connectivity problems include

bull Loss of connectivity - When users cannot access areas of your network your organizations effectiveness is impaired Immediately correct any connectivity breaks

bull Intermittent connectivity - Although users have access to network resources some of the time they are still facing periods of downtime Intermittent connectivity problems can indicate that your network is on the verge of a major break If connectivity is erratic investigate the problem immediately

bull Timeout problems - Timeouts cause loss of connectivity but are often associated with poor network performance

About Performance Problems

Your network has performance problems when it is not operating as effectively as it should For example response times may be slow the network may not be as reliable as usual and users may be complaining that it takes them longer to do their work Some performance problems are intermittent such as instances of duplicate addresses Other problems can indicate a growing strain on your network such as consistently high utilization rates

If you regularly examine your network for performance problems you can extend the usefulness of your existing network configuration and plan network enhancements instead of waiting for a performance problem to adversely affect the users productivity

63 5

Solving Connectivity and Performance Problems

When you troubleshoot your network you employ tools and knowledge already at your disposal With an in-depth understanding of your network you can use network software tools such as Ping and network devices such as Analyzers to locate problems and then make corrections such as swapping equipment or reconfiguring segments based on your analysis

Transcendreg provides another set of tools for network troubleshooting These tools have graphical user interfaces that make managing and troubleshooting your network easier With Transcend Applications you can

bull Baseline your networks normal status to use as a basis for comparison when the network operates abnormally

bull Precisely monitor network events bull Be notified immediately of critical problems on your network such as a

device losing connectivity bull Establish alert thresholds to warn you of potential problems that you can

correct before they affect your network bull Resolve problems by disabling ports or reconfiguring devices

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications This seven-layer structure provides a clear picture of how network communications work

Protocols (rules) govern communications between the layers of a single system and among several systems In this way devices made by different manufacturers or using different designs can use different protocols and still communicate

By understanding how network troubleshooting fits into the framework of the OSI model you can identify at what layer problems are located and which type of troubleshooting tools to use For example unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration If you are receiving high rates of FCS Errors and Alignment Errors which you can monitor with Status Watch then the problem is probably located at the physical layer and not the network layer Figure 1 shows how to troubleshoot the layers of the OSI model

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers

63 6

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 6: 11591337-Basic-Network-Troubleshooting

Solving Connectivity and Performance Problems

When you troubleshoot your network you employ tools and knowledge already at your disposal With an in-depth understanding of your network you can use network software tools such as Ping and network devices such as Analyzers to locate problems and then make corrections such as swapping equipment or reconfiguring segments based on your analysis

Transcendreg provides another set of tools for network troubleshooting These tools have graphical user interfaces that make managing and troubleshooting your network easier With Transcend Applications you can

bull Baseline your networks normal status to use as a basis for comparison when the network operates abnormally

bull Precisely monitor network events bull Be notified immediately of critical problems on your network such as a

device losing connectivity bull Establish alert thresholds to warn you of potential problems that you can

correct before they affect your network bull Resolve problems by disabling ports or reconfiguring devices

Network Troubleshooting Framework

The International Standards Organization (ISO) Open Systems Interconnect (OSI) reference model is the foundation of all network communications This seven-layer structure provides a clear picture of how network communications work

Protocols (rules) govern communications between the layers of a single system and among several systems In this way devices made by different manufacturers or using different designs can use different protocols and still communicate

By understanding how network troubleshooting fits into the framework of the OSI model you can identify at what layer problems are located and which type of troubleshooting tools to use For example unreliable packet delivery can be caused by a problem with the transmission media or with a router configuration If you are receiving high rates of FCS Errors and Alignment Errors which you can monitor with Status Watch then the problem is probably located at the physical layer and not the network layer Figure 1 shows how to troubleshoot the layers of the OSI model

Table 5 describes the data that the network management tools can collect as it relates to the OSI model layers

63 6

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 7: 11591337-Basic-Network-Troubleshooting

Table 5 Network Data and the OSI Model Layers Layer Data Collected TranscendcNCS Tool Used

Application

Presentation

Session

Transport

Protocol information and other Remote Monitoring (RMON) and RMON2 data

bull LANsentry Manager

bull Traffix Manager (for more detail)

Network Routing information

bull Status Watch bull LANsentry

Manager(for more detail)

bull Traffix Manager(for more detail)

Data Link Traffic counts and other packet breakdowns

bull Status Watch

bull LANsentry Manager(for more detail)

Physical Error counts bull Status Watch

Figure 1 OSI Reference Model and Network Troubleshooting

63 7

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 8: 11591337-Basic-Network-Troubleshooting

Troubleshooting Strategy

How do you know when you are having a network problem The answer to this question depends on your sites network configuration and on your networks normal behavior See Knowing Your Network for more information

If you notice changes on your network ask the following questions

bull Is the change expected or unusual bull Has this event ever occurred before bull Does the change involve a device or network path for which you already have

a backup solution in place bull Does the change interfere with vital network operations bull Does the change affect one or many devices or network paths

63 8

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 9: 11591337-Basic-Network-Troubleshooting

After you have an idea of how the change is affecting your network you can categorize it as critical or noncritical Both of these categories need resolution (except for changes that are one-time occurrences) the difference between the categories is the time that you have to fix the problem

By using a strategy for network troubleshooting you can approach a problem methodically and resolve it with minimal disruption to network users It is also important to have an accurate and detailed map of your current network environment Beyond that a good approach to problem resolution is

bull Recognizing Symptoms bull Understanding the Problem bull Identifying and Testing the Cause of the Problem bull Solving the Problem

Recognizing Symptoms

The first step to resolving any problem is to identify and interpret the symptoms You may discover network problems in several ways Users may complain that the network seems slow or that they cannot connect to a server You may pass your network management station and notice that a node icon is red Your beeper may go off and display the message WAN connection down

User Comments

Although you can often solve networking problems before users notice a change in their environment you invariably get feedback from your users about how the network is running

such as

bull They cannot print bull They cannot access the application server bull It takes them much longer to copy files across the network than it usually

does bull They cannot log on to a remote server bull When they send e-mail to another site they get a routing error message bull Their system freezes whenever they try to Telnet

Network Management Software Alerts

Network management software as described in Your Network Troubleshooting Toolbox can alert you to areas of your network that need attention For example

63 9

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 10: 11591337-Basic-Network-Troubleshooting

bull The application displays red (Warning) icons bull Your weekly Top-N utilization report (which indicates the 10 ports with the

highest utilization rates) shows that one port is experiencing much higher utilization levels than normal

bull You receive an e-mail message from your network management station that the threshold for broadcast and multicast packets has been exceeded

These signs usually provide additional information about the problem allowing you to focus on the right area

Analyzing Symptoms

When a symptom occurs ask yourself these types of questions to narrow the location of the problem and to get more data for analysis

bull To what degree is the network not acting normally (for example does it now take one minute to perform a task that normally takes five seconds)

bull On what subnetwork is the user located bull Is the user trying to reach a server end station or printer on the same

subnetwork or on a different subnetwork bull Are many users complaining that the network is operating slowly or that a

specific network application is operating slowly bull Are many users reporting network logon failures bull Are the problems intermittent For example some files may print with no

problems while other printing attempts generate error messages make users lose their connections and cause systems to freeze

Understanding the Problem

Networks are designed to move data from a transmitting device to a receiving device When communication becomes problematic you must determine why data are not traveling as expected and then find a solution The two most common causes for data not moving reliably from source to destination are

bull The physical connection breaks (that is a cable is unplugged or broken) bull A network device is not working properly and cannot send or receive some or

all data

Network management software can easily locate and report a physical connection break (layer 1 problem) It is more difficult to determine why a network device is not working as expected which is often related to a layer 2 or a layer 3 problem

To determine why a network device is not working properly look first for

63 10

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 11: 11591337-Basic-Network-Troubleshooting

bull Valid service - Is the device configured properly for the type of service it is supposed to provide For example has Quality of Service (QoS) which is the definition of the transmission parameters been established

bull Restricted access - Is an end station supposed to be able to connect with a specific device or is that connection restricted For example is a firewall set up that prevents that device from accessing certain network resources

bull Correct configuration - Is there a misconfiguration of IP address subnet mask gateway or broadcast address Network problems are commonly caused by misconfiguration of newly connected or configured devices See Manager-to-Agent Communication for more information

Identifying and Testing the Cause of the Problem

After you develop a theory about the cause of the problem test your theory The test must conclusively prove or disprove your theory

Two general rules of troubleshooting are

bull If you cannot reproduce a problem then no problem exists unless it happens again on its own

bull If the problem is intermittent and you cannot replicate it you can configure your network management software to catch the event in progress

For example with LANsentry Manager you can set alarms and automatic packet capture filters to monitor your network and inform you when the problem occurs again See Configuring Transcend NCS for more information

Although network management tools can provide a great deal of information about problems and their general location you may still need to swap equipment or replace components of your network until you locate the exact trouble spot

After you test your theory either fix the problem as described in Solving the Problem or develop another theory

Sample Problem Analysis

This section illustrates the analysis phase of a typical troubleshooting incident

On your network a user cannot access the mail server You need to establish two areas of information

bull What you know - In this case the users workstation cannot communicate with the mail server

bull What you do not know and need to test -

63 11

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 12: 11591337-Basic-Network-Troubleshooting

bull Can the workstation communicate with the network at all or is the problem limited to communication with the server Test by sending a Ping or by connecting to other devices

bull Is the workstation the only device that is unable to communicate with the server or do other workstations have the same problem Test connectivity at other workstations

bull If other workstations cannot communicate with the server can they communicate with other network devices Again test the connectivity

The analysis process follows these steps

1 Can the workstation communicate with any other device on the subnetwork

bull If no then go to step 2 bull If yes determine if only the server is unreachable

bull If only the server cannot be reached this suggests a server problem Confirm by doing step 2

bull If other devices cannot be reached this suggests a connectivity problem in the network Confirm by doing step 3

2 Can other workstations communicate with the server

bull If no then most likely it is a server problem Go to step 3 bull If yes then the problem is that the workstation is not communicating with

the subnetwork (This situation can be caused by workstation issues or a network issue with that specific station)

3 Can other workstations communicate with other network devices

bull If no then the problem is likely a network problem bull If yes the problem is likely a server problem

When you determine whether the problem is with the server subnetwork or workstation you can further analyze the problem as follows

bull For a problem with the server - Examine whether the server is running if it is properly connected to the network and if it is configured appropriately

bull For a problem with the subnetwork - Examine any device on the path between the users and the server

bull For a problem with the workstation - Examine whether the workstation can access other network resources and if it is configured to communicate with that particular server

63 12

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 13: 11591337-Basic-Network-Troubleshooting

Equipment for Testing

To help identify and test the cause of problems have available

bull A laptop computer that is loaded with a terminal emulator TCPIP stack TFTP server CD-ROM drive (to read the online documentation) and some key network management applications such as LANsentryreg Manager With the laptop computer you can plug into any subnetwork to gather and analyze data about the segment

bull A spare managed hub to swap for any hub that does not have management Swapping in a managed hub allows you to quickly spot which port is generating the errors

bull A single port probe to insert in the network if you are having a problem where you do not have management capability

bull Console cables for each type of connector labeled and stored in a secure place

Solving the Problem

Many device or network problems are straightforward to resolve but others yield misleading symptoms If one solution does not work continue with another

A solution often involves

bull Upgrading software or hardware (for example upgrading to a new version of agent software or installing Gigabit Ethernet devices)

bull Balancing your network load by analyzing bull What users communicate with which servers bull What the user traffic levels are in different segments

Based on these findings you can decide how to redistribute network traffic

bull Adding segments to your LAN (for example adding a new switch where utilization is continually high)

bull Replacing faulty equipment (for example replacing a module that has port problems or replacing a network card that has a faulty jabber protection mechanism)

To help solve problems have available

bull Spare hardware equipment (such as modules and power supplies) especially for your critical devices

bull A recent backup of your device configurations to reload if flash memory gets corrupted (which can sometimes happen due to a power outage)

63 13

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 14: 11591337-Basic-Network-Troubleshooting

Your Network Troubleshooting Toolbox

A robust network troubleshooting toolbox consists of items (such as network management applications hardware devices and other software) to recognize diagnose and solve networking problems It contains

bull Transcend Applications bull Network Management Platforms bull 3Com SmartAgent Embedded Software bull Other Commonly Used Tools

Transcend management software is optimized for managing 3Com devices and their attached networks However some applications such as LANsentry Manager can manage any vendors networking equipment that complies with the Remote Monitoring (RMON) Management Information Base (MIB)

This section describes these Transcend applications which you can use to troubleshoot your network

bull Transcend Central bull Status Watch bull Address Tracker bull LANsentry Manager bull Traffix Manager bull Device View

Transcend Central

Start with Transcend Central which is an asset management and device grouping application to understand what your network consists of and to control the Transcend NCS network management troubleshooting tools Transcend Central is available as both a native Windows application and a Java application that you can access using a Web browser

Using Transcend Central for troubleshooting you can

bull Display an inventory of device module and port information bull Group devices to make your troubleshooting tasks easier By managing a

collection of devices you can simultaneously perform the same tasks on each device in a group and locate physical or logical problems on your network

bull Launch Transcend NCS applications including some of your primary Transcend NCS troubleshooting tools

bull Status Watch includes Web Reporter (from the Java version) bull Address Tracker bull LANsentry Manager bull Traffix Manager

63 14

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 15: 11591337-Basic-Network-Troubleshooting

bull Device View

Status Watch

The Status Watch applications manage 3Com devices and their attached networks Status Watch applications primarily poll for MIB-II data This is a performance monitoring application that allows you to monitor the operational status of your network devices and quickly identify any problems that require your attention It works in conjunction with Web Reporter

Web Reporter

Web Reporter is a data-reporting application that runs in a World Wide Web (WWW) browser It generates reports from data that Status Watch collects allowing you to compare network statistics against a baseline

Address Tracker

Address Tracker is an address collection and discovery application that

bull Polls managed devices for all MAC addresses bull Polls managed devices and routers for IP addresses to perform MAC-to-IP

address translation bull Uses Device View to disable troublesome ports

LANsentry Manager

LANsentry Manager is a set of integrated applications that displays and explores the real-time and historical data that RMON-compliant devices (probes) on the network capture LANsentry Manager uses SNMP polling to gather RMON and RMON2 data from the probes

Use LANsentry Manager to

bull Monitor current performance of network segments bull See trends over time bull Spot signs of current problems bull Configure alarms to monitor for specific events bull Capture packets and display their contents

LANsentry Manager works with any device (from 3Com or other vendors) that supports the RMON MIB or the RMON2 MIB

63 15

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 16: 11591337-Basic-Network-Troubleshooting

Traffix Manager

Traffix Manager is a performance-monitoring application that provides information about layer 2 (RMON) and layer 3 conversations between nodes It helps you to assess traffic patterns on your network Traffix Manager

bull Monitors all the stations that the RMON2-compliant probes encounter on your network

bull Captures and stores RMON and RMON2 data for your networks protocols and applications

bull Displays traffic between stations in user-defined views of the network bull Graphs current or historical data on the devices selected bull Delivers reports for user-specified stations and time periods as postscript to

your printer or as HTML to your Web server bull Launches LANsentry Manager tools for in-depth analysis of a station or a

conversation between stations

You can use Traffix Manager to

bull Know your network - Understand overall flow patterns and interactions between systems and determine how your network is really being used at the application level

bull Optimize your network - Gain an insight into traffic and application usage trends to help you optimize the use and placement of current network resources and make wise decisions about capacity planning and network growth

Traffix Manager works with any device (from 3Com or other vendors) that supports the RMON2 MIB

Device View

The Device View application is a device configuration tool When you troubleshoot your network you can use Device View to determine or change a devices configuration You can also use Device View to look at a devices statistics and to set alarms

Device View manages only 3Com devices

Network Management Platforms

As part of your troubleshooting toolbox your network management platform is the first place to go to view the overall health of your network With the platform you can understand the logical configuration of your network and configure views of your network to understand how devices work together and the role that they play in the users work The network management platform

63 16

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 17: 11591337-Basic-Network-Troubleshooting

that supports your Transcend software installation can provide valuable troubleshooting tools Transcend runs on several platforms within the NT and UNIX environments

The platform discovers the devices Transcend imports that information from the platform to populate the core database Unless you are rediscovering the user must manually update the platform

Using this device database a map displays the graphical representation of your network Each device on your network appears as a symbol (icon) on the map You can configure views of your network to show devices on the same subnetworks or floors

You can monitor network performance and diagnose network performance and connectivity problems You can also

bull Take a snapshot of your network in its normal state The snapshot records the state of your network at a particular instant If you later have network performance problems you can compare the current state of your network to the snapshot

bull Quickly determine the connectivity status of a device by noting the color of its map symbol Red usually means that communication with a device has ceased

bull Diagnose connectivity problems by determining whether two devices can communicate If they can communicate then examine the route between the devices the number of packets that were sent and lost and the roundtrip time between the two devices

bull Manage MIB information (for example collecting and storing MIB data for trend analysis and graphing) using MIB queries Transcend compiles MIBs and allows you to navigate up and down the MIB Tree to retrieve MIB objects from devices You can set thresholds for MIB data and generate events when a threshold is exceeded

bull Configure the software to act on certain events The Event Categories window informs you of any unexpected events (which arrive in the form of traps)

For more information see the documentation that is shipped with your software

3Com SmartAgent Embedded Software

Traditional Simple Network Management Protocol (SNMP) management places the burden of collecting network management information on the management station In this traditional model software agents collect information about throughput record errors or packet overflows and measure performance based on established thresholds Through a polling process agents pass this information to a centralized network management station whenever they receive an SNMP query Management applications then make the data useful and alert the user if there are problems on the device

63 17

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 18: 11591337-Basic-Network-Troubleshooting

As a useful companion to traditional network management methods 3Coms SmartAgentreg

technology places management intelligence into the software agent that runs within a 3Com device This scalable solution reduces the amount of computational load on the management station and helps minimize management-related network traffic

SmartAgent software which uses the RMON MIB is self-monitoring collecting and analyzing its own statistical analytical and diagnostic data In this way you can conduct network management by exception - that is you are notified only if a problem occurs Management by exception is unlike traditional SNMP management in which the management software collects all data from the device through polling

SmartAgent software works autonomously and reports to the network management station whenever an exceptional network event occurs The software can also take direct action without involving the management station Devices that contain SmartAgent software may be able to

bull Perform broadcast throttling to minimize the flow of broadcast traffic on your network

bull Monitor the ratio of good frames to bad frames bull Switch a resilient link pair to the standby path if the primary path corrupts

frames bull Report if traffic on vital segments drops below minimum usage levels bull Disable a port for five seconds to clear problems and then automatically

reconnect it

The Transcend NCS applications LANsentry Manager and Traffix Manager make RMON data that the SmartAgent software collect more usable by summarizing and correlating important information

Other Commonly Used Tools

These commonly used tools can also help you troubleshoot your network

bull Network software such as Ping Telnet and FTP and TFTP You can use these applications to troubleshoot configure and upgrade your system

bull Network monitoring devices such as Analyzers and Probes bull Tools such as Cable Testers for working on physical problems

Ping

Packet Internet Groper (Ping) allows you to quickly verify the connectivity of your network devices Ping attempts to transmit a packet from one device to a station on the network and listens for the response to ensure that it was correctly received You can validate connections on the parts of your network by pinging different devices

63 18

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 19: 11591337-Basic-Network-Troubleshooting

bull A successful response indicates that a valid network path exists between your station and the remote host and that the remote host is active

bull Slower response times than normal can indicate that the path is congested or obstructed

bull A failed response indicates that a connection is broken somewhere use the message to help locate the problem See Tips on Interpreting Ping Messages

Strategies for Using Ping

Follow these strategies for using Ping

bull Ping devices when your network is operating normally so that you have a performance baseline for comparison See Identifying Your Networks Normal Behavior for more information

bull Ping by IP address when bull You want to test devices on different subnetworks This method allows

you to Ping your network segments in an organized way rather than having to remember all the hostnames and locations

bull Your Domain Name System (DNS) server is down and your system cannot look up host names properly You can Ping with IP addresses even if you cannot access hostname information

bull Ping by hostname when you want to identify DNS server problems bull To troubleshoot problems that involve large packet sizes Ping the remote

host repeatedly increasing the packet size each time bull To determine if a link is erratic perform a continuous Ping (using ping -s on

UNIX) which indicates the time that it takes the device to respond to each Ping

bull To determine a route taken to a destination use the trace route function (tracert)

bull Consider creating a Ping script that periodically sends a Ping to all necessary networking devices If a Ping failure message is received the script can perform some action to notify you of the problem such as paging you

bull Use the Ping functions of your network management platform For example in your HP OpenView map select a device and click the right mouse button to gain access to ping functions

Tips on Interpreting Ping Messages

Use the following ping failure messages to troubleshoot problems

No reply from ltdestinationgt

Indicates that the destination routes are available but that there is a problem with the destination itself

63 19

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 20: 11591337-Basic-Network-Troubleshooting

ltdestinationgt is unreachable

Indicates that your system does not know how to get to the destination This message means either that routing information to a different subnetwork is unavailable or that a device on the same subnetwork is down

ICMP host unreachable from gateway

Indicates that your system can transmit to the target address using a gateway but that the gateway cannot forward the packet properly because either a device is misconfigured or the gateway is not operating

Telnet

Telnet which is a login and terminal emulation program for Transmission Control ProtocolInternet Protocol (TCPIP) networks is a common way to communicate with an individual device You log in to the device (a remote host) and use that remote device as if it were a local terminal

If you have established an out-of-band Telnet connection with a device you can use Telnet to communicate with that device even if the network is unavailable This feature makes Telnet one of the most frequently used network troubleshooting tools Usually all device statistics and configuration capabilities are accessible by using Telnet to connect to the devices console For more information about setting up an out-of-band connection see Using Telnet Serial Line and Modem Connections

You can invoke the Telnet application on your local system and set up a link to a Telnet process that is running on a remote host You can then run a program that is located on a remote host as if you were working at the remote system

FTP and TFTP

Most network devices support either the File Transfer Protocol (FTP) or the Trivial File Transfer Protocol (TFTP) for downloading updates of system software Updating system software is often the solution to networking problems that are related to agent problems Also new software features may help correct a networking problem

FTP provides flexibility and security for file transfer by

bull Accepting many file formats such as ASCII and binary bull Using data compression bull Providing Read and Write access so that you can display create and delete

files and directories bull Providing password protection

63 20

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 21: 11591337-Basic-Network-Troubleshooting

TFTP is a simple version of FTP that does not list directories or require passwords TFTP only transfers files to and from a remote server

Analyzers

An analyzer which is often called a Sniffer is a network device that collects network data on the segment to which it is attached a process called packet capturing Software on the device analyzes this data which is a process referred to as protocol analysis Most analyzers can interpret different types of protocol traffic such as TCPIP AppleTalk and Banyan VINES traffic

You usually use analyzers for reactive troubleshooting - when you see a problem somewhere on your network you attach an analyzer to capture and interpret the data from that area Analyzers are particularly helpful for identifying intermittent problems For example if your network backbone has experienced moments of instability that prevent users from logging on to the network you can attach an analyzer to the backbone to capture the intermittent problems when they happen again

Probes

Like Analyzers a probe is a network device that collects network data Depending on its type a probe can collect data from multiple segments simultaneously It stores the collected data and transfers the data to an analysis site when requested Unlike an analyzer probes do not interpret data

A probe can be either a stand-alone device or an agent in a network device The Transcend Enterprise Monitor 500 series and the SuperStackreg II Monitor series are stand-alone RMON probes LANsentry Manager and Traffix Manager use data from probes that comply with the RMON MIB or the RMON2 MIB

You can use a probe daily to determine the health of your network The Transcend NCS applications can interpret and report this data alerting you to possible problems so that you can proactively manage your network For example an RMON2 probe can help you to analyze traffic patterns on your network Use this data to make decisions about reconfiguring devices and end stations as needed

Cable Testers

Cable testers examine the electrical characteristics of the wiring They are most commonly used to ensure that building wiring and cables meet Category 5 4 and 3 standards For example network technologies such as Fast Ethernet require the cabling to meet Category 5 requirements Testers are also used to find defective and broken wiring in a building

63 21

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 22: 11591337-Basic-Network-Troubleshooting

Knowing Your Network

You can better troubleshoot the problems on your network by

bull Knowing Your Networks Configuration bull Identifying Your Networks Normal Behavior

Knowing Your Networks Configuration

Part of understanding your network is knowing its physical and logical configuration You should know

bull Which devices are on your network bull How the devices are configured bull Which devices are attached to the backbone bull Which devices connect your network to the outside world (WAN)

To keep track of your networks configuration gather the following information

bull Site Network Map bull Logical Connections bull Device Configuration Information bull Other Important Data About Your Network

This data when kept up-to-date is extremely helpful for locating information when you experience network or device problems

Site Network Map

A network map helps you to

bull Know exactly where each device is physically located bull Easily identify the users and applications that are affected by a problem bull Systematically search each part of your network for problems

You can create a network map using any drawing or flow chart application Store your network map online In addition make sure that you always have a current version on paper in case you cannot access the online version Figure 8 shows an example of a network map of 3Com devices

Figure 8 Example of a Site Network Map

63 22

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 23: 11591337-Basic-Network-Troubleshooting

Consider including the following information on your network map

bull Location of important devices and workgroups (by floor building or area) bull Location of the network backbone data center and wiring closets as

appropriate for your network bull Location of your network management stations bull Location and type of remote connections bull IP subnetwork addresses for all managed switches and hubs bull Other subnetwork addresses such as Novell IPX and AppleTalk if appropriate

for your network bull Type of media (by actual name such as 10BASE-T or by grouping such as

Ethernet) which you can show with callouts colors line weights or line styles

bull Virtual workgroups which you can show with colors or shaded areas bull Redundant links which you can indicate with gray or dashed lines bull Types of network applications that are used in different areas of your network bull Types of end stations that are connected to the switches and hubs Logical

Connections

63 23

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 24: 11591337-Basic-Network-Troubleshooting

With the advent of virtual LANs (VLANs) you need to know how your devices are connected logically as well as physically For example if you have connected two devices through the same physical switch you can assume that they can communicate with each other However the devices can be in separate VLANs that restrict their communication

Knowing the setup of your VLANs can help you to quickly narrow the scope of a problem to a VLAN instead of to a network connection

The Transcend NCS application Enterprise VLAN Manager allows you to view the logical makeup of your network Depending on the complexity of your network and VLAN configurations you can use colors to show the VLANs graphically on your network map

Device Configuration Information

Maintain online and paper copies of device configuration information Make sure that all online data is stored with your sites regular data backup If your site does not have a backup system copy the information onto a backup disc (CD Zip disk and the like) and store it offsite

Follow these guidelines for saving configuration information

bull Because the easiest way to recover a devices configuration is to use FTP or TFTP save the configuration settings of each device that supports this method of uploading

bull For other devices Telnet in and save the session (which contains configuration details) to a file If you cannot print the configuration of a device then create a quick rebuild guide that explains the quickest way to configure the device from a fresh install

bull For devices that store information to diskette store this data as part of your sites regular backup

bull For routers and other important devices with text configuration files store this data online in a revision control system Keep the most recent version on paper Keep previous versions

bull For PCs keep a recovery disk for each type of PC For any device that you use as a server store all startup scripts and copies of registries

Other Important Data About Your Network

For a complete picture of your network have the following information available

bull All passwords - Store passwords in a safe place Keep previous passwords in case you restore a device to a previous software version and need to use the old password that was valid for that version

bull Device inventory - The inventory allows you to see the device type IP address ports MAC addresses and attached devices at a glance Software tools such as Transcend Central can help you keep track of the 3Com

63 24

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 25: 11591337-Basic-Network-Troubleshooting

devices on your network Using Transcend Central you can group devices by type and location and have this information on hand for troubleshooting

bull MAC address-to-port number list - If your hubs or switches are not managed you must keep a list of the MAC addresses that correlate to the ports on your hubs and switches Generate and keep a paper copy of this list which is crucial for deciphering captured packets using Address Tracker

bull Log book - Document your interactions no matter how trivial with each device that is critical to your networks operation (that is routers remote access devices security servers) For example document that you noticed a fan making noise one morning Your note may help you to identify why a device is over temperature a week later (because the fan stopped working)

bull Change control - Maintain a change control system for all critical systems Permanently store change control records

bull Contact details - Store online and on paper the details of all support contracts support numbers engineer details and telephone and fax numbers

bull LANsentry Reporter - Use LANsentry Reporter to generate reports from the database

Identifying Your Networks Normal Behavior

By monitoring your network over a long period you begin to understand its normal behavior You begin to see a pattern in the traffic flow such as which servers are typically accessed when peak usage times occur and so on If you are familiar with your network when it is fully operational you can be more effective at troubleshooting problems that arise

Baselining Your Network

You can use a baseline analysis which is an important indicator of overall network health to identify problems A baseline can serve as a useful reference of network traffic during normal operation which you can then compare to captured network traffic while you troubleshoot network problems A baseline analysis speeds the process of isolating network problems

By running tests on a healthy network you compile normal data to compare against the results that you get when your network is in trouble For example Ping each node to discover how long it typically takes you to receive a response from devices on your network

Applications such as Status Watch Address Tracker LANsentry Manager and Traffix Manager allow you to collect days and weeks of data and set a baseline for comparison Through the reporting mechanisms in the following list you can continuously assess the data from your network and ensure that its performance is optimal

bull Web Reporter generates daily or weekly reports from data collected by Status Watch

63 25

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 26: 11591337-Basic-Network-Troubleshooting

bull Traffix Manager generates weekly reports from collected data and calculates the baselines for you Set up Utilization History and Error History reports with data resolution set to Weekly

bull LANsentry Manager History View generates daily utilization graphs which are sampled every 30 minutes for each day over one week Use these graphs to calculate your network baselines manually

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

Identifying Background Noise

Know your networks background noise so that you can recognize real data flow For example one evening after everyone is gone no backups are running and most nodes are on analyze the traffic on your network using the Traffix Manager application The traffic that you see is mostly broadcast and multicast packets Any errors that you see are the result of faulty devices (trace) This traffic is the background noise of your network - traffic that occurs for little value If background noise is high redesign your network

FDDI Connectivity

Use these sections to identify and correct connectivity errors on an FDDI ring

bull FDDI Connectivity Overview bull Monitoring FDDI Connections bull Making Your FDDI Connections More Resilient

Fiber Distributed Data Interface (FDDI) which is a self-correcting technology automatically corrects ring faults to maintain connectivity throughout most of the network However you should monitor your FDDI connections for wrapped rings and other problems with ring connectivity

Understanding the Problem

As shown in Figure 9 in a thru FDDI LAN no stations on the trunk ring have a Configuration State (SMTConfigurationState) of Wrap or Isolated However users who complain about network performance may have lost connectivity to other stations on the network because the FDDI network is wrapped or segmented

63 26

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 27: 11591337-Basic-Network-Troubleshooting

Figure 9 Thru Ring

Wrapped ring

By monitoring the Peer Wrap Condition you can see when the Configuration State changes In a wrapped ring (Figure 10) two stations on the LAN are in a wrapped Configuration State This condition may or may not affect the connectivity of certain stations Although operational your network may have a cabling problem or a problem with a link

Figure 10 Wrapped LAN

Segmented ring

In a segmented ring (Figure 11) more than two stations are wrapped on the trunk ring Although this mode of operation is a valid FDDI LAN configuration your LAN is probably experiencing a degraded or degrading condition

Figure 11 Segmented Ring

When a network connection has excessively high link errors Station Management (SMT) shuts down the connection and tries to reconnect again A dual-attachment trunk ring station with an A or B connection that is shut down is one of the wrap points in the network See Making Your FDDI Connections More Resilient for information about keeping a dual-attachment station connection from wrapping

63 27

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 28: 11591337-Basic-Network-Troubleshooting

Isolated station

Sometimes a network wraps a particular station out of the ring Stations on either side of a problem station can be wrapped This effectively isolates the station or links that have problems as shown in Figure 12

Figure 12 Wrapped Ring with Isolated Station

If a ring was already wrapped when a network wraps a station out of the ring then a segmented ring results as shown in Figure 13

Figure 13 Segmented Ring with Isolated Stations

Twisted ring

In a twisted ring an A port is connected to an A port and a B port is connected to a B port instead of the normal A-to-B connections A twisted ring which always has two twist points (stations) can exist in either a Thru or Wrap state You can monitor the Twisted Ring Condition and Undesired Connection Attempt Event for evidence of twisted ring and other connection problems

Identifying the Problem

To identify the problem follow this process

1 At the FDDI LAN level verify that your network is operating

63 28

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 29: 11591337-Basic-Network-Troubleshooting

If the network is operating the FDDI ring may be segmented and therefore an FDDI station or an Ethernet station on an Ethernet link may have lost connectivity to other nodes on the network

2 Determine if a ring is in a Thru Wrap or Segmented state

If the FDDI ring is segmented or wrapped look for a problem with a link somewhere in the network or for a nonfunctioning node on your trunk ring If the ring is operating and is not segmented or if it is segmented but you still have connectivity to the stations in question move to a more specific level in your network

See Monitoring FDDI Connections for more information

3 Determine if the poorly performing station is an Ethernet or FDDI station

If the problem is an FDDI station find out if it is congested (that is if the station is so busy that it cannot accept all the network traffic that is directed to it) by determining its Bandwidth Utilization Also determine if the station has a high frame error rate by looking at the FDDI Ring Errors

If the problem is an Ethernet station look for congestion by examining Ethernet Packet Loss and Bandwidth Utilization

Solving the Problem

Identify the station that is causing the disconnection and take the appropriate steps

bull If the disconnection is caused by a wrapped ring then fix the hardware or cabling problem at that station

bull If the station is congested you have a device problem rather than a network problem For example if the congested station is a file server and every other machine on the network is retrieving and saving files using that server consider upgrading your server or adding additional servers to the network A variety of devices from different vendors may be communicating on an FDDI or Ethernet network some are faster and more capable and some are slower and more prone to congestion

bull If the station is an Ethernet station that is attached to an Ethernet segment reevaluate the setup of your Ethernet network and make some changes to improve its performance

You can also make FDDI connections more resilient by implementing dual homing or installing an Optical Bypass Unit (OBU) where FDDI connections are prone to fail See Making Your FDDI Connections More Resilient for more information

63 29

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 30: 11591337-Basic-Network-Troubleshooting

Monitoring FDDI Connections

Monitor your FDDI devices for Warning or Critical alerts in the FDDI Status tool

Status Watch

Use Status Watch to identify these FDDI connectivity errors

bull Peer Wrap Condition bull Twisted Ring Condition bull Undesired Connection Attempt Event

Follow these steps

1 In the Device area select the device that is located where you suspect an FDDI ring connectivity problem

2 Monitor the FDDI Status tool for the currently selected device

Here are some pointers for monitoring

bull If the Peer Wrap Configuration State variable is Isolated the device is not connected to the FDDI trunk ring If you intend the device to remain isolated this indication is not a serious condition However if the device is supposed to be connected on a trunk ring a serious problem may exist The device is no longer transmitting packets to the larger trunk ring

bull If the Peer Wrap flag (SMTPeerWrapFlag) is set the device is one of the wrap points The cause of the wrapped ring is somewhere in the portion of the network between the two stations that report the peer wrap condition

Making Your FDDI Connections More Resilient

When devices are removed from an FDDI ring there is a break in the fiber path that causes the ring to wrap until the ring is made whole again To prevent the break in the FDDI connection you can implement dual homing or install an Optical Bypass Unit (OBU)

Implementing Dual Homing

When the operation of a dual attachment node is critical to your network dual homing adds reliability by providing a backup connection if the primary link fails Because a dual attachment station (DAS) has two attachments to the FDDI ring (A-to-M and B-to-M) you can use one of

63 30

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 31: 11591337-Basic-Network-Troubleshooting

them as a standby link if the active link fails Using dual homing only one of the two attachments is active at a time In this sense a DAS acts as if it is a single attachment station (SAS) by using its A port as the standby link

Through SMT a DAS can be dual homed to the same concentrator or more commonly to two concentrators This arrangement provides a more stable trunk ring of concentrators If one concentrator fails the DAS enables the standby link to another concentrator to become the active link See Figure 14

If the station is a dual path or dual pathdual MAC station you can configure the dual-homed station in one of two ways

bull With both links active bull With one link active and one connection withheld as a backup only becoming

active when one link fails

Figure 14 Dual Homing Configuration

Installing an Optical Bypass Unit

You can insert an Optical Bypass Unit (OBU) into the FDDI ring as if it were a node and then plug your device into it To use an OBU your device needs an optical bypass interface This interface lets the bypass know whether your device is still on the ring or not See Figure 15

If your device is removed or if it fails the bypass unit diverts the optical path away from your device keeping the ring whole You can use a bypass on devices that are prone to failure or are likely to be removed often such as diagnostic equipment

63 31

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 32: 11591337-Basic-Network-Troubleshooting

Figure 15 Optical Bypass Unit Configuration

FDDI Connectivity Reference

This section explains terms that are relevant to FDDI connectivity and provides additional conceptual and problem analysis detail

Peer Wrap Condition

A Peer Wrap (wrapped ring) condition occurs when a dual-attachment station detects a fault (often a lost connection) and reconfigures the network by wrapping the dual trunk rings to form a single ring Normally the two stations that are adjacent to the fault wrap to maintain full connectivity However if a second fault occurs before the first is repaired the network partitions itself into two or more rings and stations lose connectivity

When a station reports a Peer Wrap condition locate and repair the problem that caused the station to wrap the rings Potential causes include

bull Faulty FDDI port hardware bull Faulty cables or connectors bull Unplugged connectors bull Powered-down stations

You can expect to find the cause of the problem between the two stations that report the Peer Wrap condition

63 32

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 33: 11591337-Basic-Network-Troubleshooting

Twisted Ring Condition

A Twisted Ring condition occurs when certain undesirable connection types exist See Table 7 for more information Although similar to the Undesired Connection Attempt the Twisted Ring condition provides specific Station Management (SMT) and port information for diagnosis

Undesired Connection Attempt Event

An Undesired Connection Attempt event occurs when a port tries to connect to another port of a type that may result in an undesirable network topology Whether the connection attempt is successful depends on the current setting of the stations connection policies

Table 7 lists connections that the FDDI standard defines as undesirable The managed devices may or may not permit these connections depending on their FDDI station configurations

Table 7 Undesirable Connection Types Connection Type1 Reason That the Connection Is Undesirable

A-A Twisted primary and secondary rings

A-S A wrapped ring

B-B Twisted primary and secondary rings

B-S A wrapped ring

S-A A wrapped ring

S-B A wrapped ring

M-M A tree of rings topology (illegal connection)

Table 8 lists FDDI connections that create valid topologies

Table 8 Valid Connection Types Connection

Type Reason That the Connection Is Valid

A-B A normal trunk peer connection

A-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

B-A A normal trunk ring peer connection

B-M A tree connection with possible redundancy In a single MAC node Port B has precedence (by default) for connecting to a Port M

S-S A single ring of two slave stations

S-M A normal tree connection

M-A A tree connection that provides possible redundancy

M-B A tree connection that provides possible redundancy

M-S A normal tree connection

63 33

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 34: 11591337-Basic-Network-Troubleshooting

Disabling the Offending Interface

Because broadcast storms can ultimately cause your whole network to become unavailable take action immediately to disable the offending interface You can enable the interface again after you have corrected the problem

Address Tracker

Use Address Tracker to locate the interface that is causing the broadcast storm Use Device View to disable the port

Follow these steps

1 In the Find Address window enter the address of the interface that seems to be receiving the broadcast traffic

You can copy the MAC or IP address from the Status Watch report and paste it into Address Trackers Enter the Address You Want to Find field

2 Click Find Now

Search displays the device name

3 Use Transcend Central to launch Device View and disable the port

Disabling the port stops the broadcast storm before it interferes with all vital network traffic You can re-enable this interface using Device View or the devices console later

Correcting Spanning Tree Misconfigurations

Spanning Tree does not cause broadcast storms but a loop in your Spanning Tree topology can create data that looks like a storm A loop can occur in your topology if

bull Someone disables Spanning Tree on a port bull You set up your Spanning Tree configuration incorrectly

63 34

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 35: 11591337-Basic-Network-Troubleshooting

Device View

Use Device View to disable any Spanning Tree port that has a repeater attached to it and to correct Spanning Tree misconfigurations

To correct Spanning Tree misconfigurations use Device View to disable Spanning Tree Protocol (STP) for a port on a SuperStackreg II Switch 1000 Switch 3000 Switch 3000 10100 Switch 9000SX Desktop Switch LinkBuilderreg FMS II BridgeManagement Module or CoreBuilder 6000

To disable the STP port state for a port on a SuperStack II switch

1 Select a port and click the right mouse button

2 From the shortcut menu select Configure

3 In the Port section click the STP tab

4 From the STP Port State list box select Disabled

5 Click Apply

To disable the STP port state for a port on a LinkBuilder FMS II BridgeManagement Module

1 Double-click the module

2 From the shortcut menu select Configure Bridge

3 In the Port section click the STP tab

Broadcast Storms Reference

This section explains terms that are relevant to broadcast storms and provides additional conceptual and problem analysis detail

Broadcast Packets

Broadcast packets which are a normal part of network operation are transmitted by a device to a broadcast address For example IP networks use broadcasts to resolve network addresses using Address Resolution Protocol (ARP) IPX networks use a large number of broadcast packets to operate most effectively

63 35

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 36: 11591337-Basic-Network-Troubleshooting

Problems arise when broadcast packets endlessly propagate throughout the network which increases the traffic volume on your network and the CPU time that each host spends processing and discarding unwanted broadcast packets

Multicast Packets

Multicast packets which are a normal part of network operation are transmitted by a device to a multicast group address Hosts that want to receive the packets indicate that they want to be members of the multicast group and then multicast packets are distributed to that group For example multicast packets support the Spanning Tree Protocol Multicast applications and underlying multicast protocols control multimedia traffic and shield hosts from processing unnecessary broadcast traffic However multicast traffic can also cause storms that saturate your network

Duplicate Addresses

Use these sections to identify and correct problems caused by duplicate MAC and IP addresses

bull Duplicate Addresses Overview bull Finding Duplicate MAC Addresses bull Finding Duplicate IP Addresses

See Duplicate Addresses Reference for additional conceptual and problem analysis detail

Networks sometime generate duplicate MAC and IP addresses Because duplicate addresses can cause problems with packet delivery resolve them as soon as possible

Understanding the Problem

Duplicate MAC addresses are caused by data link layer problems with Fiber Distributed Data Interface (FDDI) media and the passing of tokens on the FDDI ring Duplicate IP addresses are caused by network layer problems See these sections for more information about causes of duplicate addresses

bull Duplicate MAC Addresses bull Duplicate IP Addresses

Identifying the Problem

Identify duplicate MAC and IP addresses by following the instructions in these sections

bull Finding Duplicate MAC Addresses

63 36

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 37: 11591337-Basic-Network-Troubleshooting

bull Finding Duplicate IP Addresses

Solving the Problem

Identify the cause of the duplicate address (such as user error or a hardware problem) and fix the problem if possible

Finding Duplicate MAC Addresses

To find out if duplicate MAC addresses are occurring monitor your network using Status Watch

Status Watch

The Status Watch FDDI Status tool identifies duplicate FDDI MAC addresses and Status Watch reports when two or more MACs on the same ring have the same MAC address (a Duplicate Address condition)

Follow these steps

1 In the Status Watch Summary View window determine if any FDDI Status conditions are reported If there are double-click the table cell value to display the Device List window

Another approach is to examine only the devices that you know reside on your FDDI ring In the Status Watch main window red device icons indicate that a threshold has been exceeded

2 Select a device

bull If you selected the device from the Device List window the real-time report for that device appears in the Status Watch main window

bull If you selected the device from the main window also select the FDDI Status tool to view the real-time report

3 Determine if a Duplicate Address condition caused the FDDI Status tool to trigger a Critical or Warning status for that device

In Status Watch you can specify the status severity level to apply to a Duplicate Address condition

Finding Duplicate IP Addresses

63 37

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 38: 11591337-Basic-Network-Troubleshooting

To find out if duplicate IP addresses are occurring monitor your network using these applications

bull Address Tracker - To find duplicate IP addresses on 3Com devices and their attached networks

bull LANsentry Manager reg - To find duplicate IP addresses that are collected by probes gathering RMON2 SmartAgentreg data from the Enterprise Communications Analysis Module (ECAM) downloaded on your network devices

Address Tracker

Use Address Tracker to determine when and where duplicate IP addresses occur

Follow these steps

1 From the Find Address menu select Find Duplicate IP Addresses

2 Click Find Now to start your search

LANsentry Manager

Use the Duplicates table in LANsentry Manager to compile a list of all stations with duplicate IP addresses This table is available only on probes that have downloaded RMON2 (ECAM) SmartAgent software

Follow these steps

1 From the LANsentry Manager Address Map menu select Duplicates Address Map data is displayed as a table

2 To export the contents of the table click Export to launch the Data Export dialog box

Duplicate Addresses Reference

This section explains terms that are relevant to duplicate addresses and provides additional conceptual and problem analysis detail

63 38

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 39: 11591337-Basic-Network-Troubleshooting

Duplicate MAC Addresses

Each device on your network has a unique MAC address This address identifies a single device on the network allowing packets to be delivered to correct destinations

Packets are delivered to their destinations by means of MAC-address-to-IP address translation that the Address Resolution Protocol (ARP) provides Therefore if MAC addresses are duplicated on the network ARP caches of routing devices contain erroneous destinations In FDDI devices monitor network traffic searching for their own MAC address in each packet to determine whether to decode the packet If MAC addresses are not unique two stations cannot be distinguished from each other

Duplicate MAC addresses can occur for the following reasons

bull Someone has manually configured a MAC address for a device instead of using the address that the vendor supplied or allowing it to be assigned dynamically and this address is also assigned to a different device

bull In rare circumstances loops in a bridged network can cause a MAC hardware problem or an address learning problem that creates a duplicate MAC address entry in the bridging address table

bull On DECnet Phase 4 networks MAC addresses are set from the DECnet address A duplicate NET address can cause a duplicate MAC address

Duplicate IP Addresses

Because IP addresses are critical for transmission of packets on TCPIP networks resolve them immediately

Duplicate IP addresses can occur when someone has configured an IP address that is identical to an IP address that is assigned to a different device Address assignments although possible for you to configure manually are usually made using one of these protocols

bull Dynamic Host Configuration Protocol (DHCP) - Allows your network to dynamically assign IP addresses to nodes With this protocol a DHCP server temporarily assigns an IP address to a node or you can statically configure addresses as needed

bull BOOTstrap protocol (BootP) - Allows you to statically assign IP addresses to nodes This protocol is more efficient than RARP

bull Reverse ARP (RARP) - Allows you to statically assign IP addresses to nodes However because this protocol relies on the MAC address to identify the node you cannot use it on networks that dynamically assign hardware addresses

63 39

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 40: 11591337-Basic-Network-Troubleshooting

Ethernet Packet Loss

Use these sections to identify and correct Ethernet packet loss

bull Ethernet Packet Loss Overview bull Searching for Packet Loss

See Ethernet Packet Loss Reference for additional conceptual and problem analysis detail

If your Ethernet network shows signs of congestion it may be experiencing packet loss When your network is congested utilization is usually high packets are discarded because buffers are full and collision rates are up Problems related to Collisions are often at the heart of packet loss

Understanding the Problem

Collisions are normal in Ethernet networks In many cases Collision rates of 50 percent do not cause a large decrease in throughput The Collision rate helps mark the upper limit on your network (the maximum percentage of collisions that your network can bear) which is usually around 70 percent If Collisions increase above this upper limit your network can become unreliable

When the Collision rates increase so do Excessive Collisions which causes a delay in transmitting data An increase in Collisions also indicates that network utilization and network errors such as FCS Errors are probably increasing

The real packet problems to watch for however are undetected collisions that show up as Late Collisions

Identifying the Problem

To identify that your networks problem is related to packet loss verify that frames are being dropped on your network by examining this packet loss data

bull Alignment Errors bull Collisions bull Excessive Collisions bull FCS Errors bull CRC Errors bull Late Collisions bull Receive Discards bull Too Long Errors bull Too Short Errors bull Transmit Discards

63 40

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 41: 11591337-Basic-Network-Troubleshooting

The process of identifying the problem is discussed in Searching for Packet Loss

Solving the Problem

If you notice that packet loss data is consistently high then your network is too congested In this case segment your network with the appropriate network device (such as a switch or router) If Collision data shows increases but your networks utilization is the same then your network may have a physical problem such as cabling that is too long Other problems that packet loss data can indicate include

bull Faulty connectors or improper cabling bull Excessive numbers of repeaters between network devices bull Defective Ethernet transceivers or controllers

Possible solutions to these problems are explained in the procedures in Searching for Packet Loss

Searching for Packet Loss

When you look for packet loss use the following applications

bull Status Watch - For Ethernet and MIB-II data collection using SNMP polling bull LANsentry Manager Network Statistics Graph - For RMON data collection

using an RMON probe bull Device View - On a per-device basis you can evaluate statistics for any port

on the device

Status Watch

Status Watch monitors

bull Alignment Errors bull Excessive Collisions bull FCS Errors bull Receive Discards bull Transmit Discards

Follow these steps

1 Determine if the thresholds for the Alignment Errors tool and FCS Errors tool are being exceeded

63 41

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 42: 11591337-Basic-Network-Troubleshooting

Table 16 identifies the problems that this data can indicate and your possible actions For information about problems related to a nonstandard Ethernet implementation see Nonstandard Ethernet Problems

Table 16 Alignment Errors FCS Errors and CRC Errors Data Possible Problem Possible Action

Faulty cabling Examine the cable and cable connections for breaks or damage

Network noise Look for improper cabling faulty cable faulty network equipment or cables that are too close to equipment that emits electromagnetic interference (lamps for example)

Faulty transceiver Use an analyzer to identify the problematic transceiver If necessary replace the transceiver network adapter or station

Fault at the transmitting end station

1 Locate the source of the errors by looking at the module and port statistics

2 Verify the correct operation of the transceiver or adapter card of the device that is connected to the problem port

3 If the card appears to be operating correctly examine the cable and cable connections for breaks or damage

Station powering up or down

None required

Early implementations of Ethernet transceivers generate a significant amount of in-band noise when powering up they frequently cause Alignment Errors and FCS Errors in an otherwise stable network

When powering up some software drivers for Ethernet controllers also initiate Time Domain Reflectometry (TDR) tests to test the Ethernet media Network monitors report TDR tests as Alignment Errors and FCS Errors

Faulty adapter Replace the adapter

2 Determine if the Excessive Collisions tool threshold is being exceeded Table 17 identifies the problems that this data can indicate and your possible actions

Table 17 Collisions and Excessive Collisions Data Possible Problem Possible Action

Busy network Use a bridge router or switch to reconfigure your network into segments with fewer stations

Faulty device (adapter switch hub and the like) that does not listen before broadcasting This problem increases the incidence of all types of collisions

Isolate each adapter to see if the problem stops

Network loop Ensure that no redundant connections to the same station have both connections active simultaneously

63 42

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 43: 11591337-Basic-Network-Troubleshooting

3 Determine if the Receive Discards and Transmit Discards tools thresholds are being exceeded

If these errors are high in conjunction with the data that you learned in steps 1 and 2 then your network is overloaded Segment your network

LANsentry Manager Network Statistics Graph

Use the LANsentry Manager Network Statistics graph to view data for

bull Collisions bull Late Collisions bull Bandwidth Utilization bull CRC Errors bull Too Long Errors bull Too Short Errors

Follow these steps

1 Display a Network Statistics graph for the local Ethernet segment on which users have reported poor performance

This graph shows the most recent trend in Collision rates If you have set up a History sample you can also look at the historical trend If a number of segments are connected by repeaters examine the graph for each Ethernet segment

2 Analyze Utilization and Collision rates to determine whether collisions are caused by an overloaded segment or a faulty component

bull If Utilization rates are high - The collisions are probably caused by an overloaded segment If you have added nodes or new applications to your network consider reconfiguring the cabling system using bridges and routers to filter out remote collisions and to keep local traffic on one segment This action should level the network load

bull If Utilization rates are stable and appear normal - The collisions are probably caused by faulty components In this case do the following

bull If the network consists of repeaters - Compare the Network Statistics graphs for each segment connected to the repeater Because repeaters repeat traffic across all connected segments (which makes many segments seem like one network) you should see similar levels of traffic on all segments One segment that shows dissimilar levels of traffic and collisions may indicate faulty hardware In this case monitor several collisions to track the source station that is transmitting too

63 43

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 44: 11591337-Basic-Network-Troubleshooting

soon after collisions and repair the station Packets that are transmitted too soon after collisions are unlikely to be valid See Table 17 for more information about Collisions

bull On other networks - Determine the segment cable length

3 Examine the CRC Errors and Late Collisions which often indicate cabling or component problems

Table 16 identifies the problems that CRC Errors can indicate and your possible actions Table 18 identifies the problems that Late Collisions data can indicate and your possible actions

Table 18 Late Collisions Data Possible Problem Possible Action

Cabling problems

bull Segment too long bull Failing cable bull Segment not grounded

properly (noise) bull Improper termination bull Taps too close (10BASE-5 and

10BASE-2 only)

bull Noisy cable

Correct the cabling problem by doing one or more of the following

bull Reduce the segment length bull Replace the cable bull Ground the cable bull Terminate the cable correctly bull Check the taps

bull Check for cables too close to equipment that emits electromagnetic interference

Component problems

bull Deaf or partially deaf node

bull Failing repeater transceiver or controller cards

Correct the component problem by doing one of the following

bull Trace the failing component and replace it

bull Replace the NIC or the transceiver

4 Trace Too Short Errors and Too Long Errors to the sender

These errors often indicate faulty routers or LAN drivers and transceiver problems Table 19 identifies the problems that this data can indicate and your possible actions

Possible Problem Possible Action

A transceiver on your network is adding bits to the packets that are transmitted by the attached station

1 Use a network analyzer to identify the problematic transceiver

63 44

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 45: 11591337-Basic-Network-Troubleshooting

2 If necessary replace the transceiver network adapter or station

The jabber protection mechanism on a transceiver has failed it can no longer protect the network from the jabbering produced by the attached station

Replace the network card

Excessive noise on the cable

Note Some 10100 Mbps cards that autodetect the network speed may connect to the network at the wrong speed causing excessive noise

Check for improper cabling faulty cable faulty network equipment or cables too close to noisy electronic equipment (lamps for example)

If your network card autodetects the network speed and you have ruled out other problems manually configure the network speed

Faulty routers (two different network types are connected and the router is not enforcing proper frame size restrictions)

Notify the manufacturer

Faulty LAN driver Replace the driver

A normal condition on a LinkSwitchreg 1000 LinkSwitchreg

3000 or CoreBuilder 5000 FastModule

If you use maximum-sized 1518 Ethernet frames the devices VLT-enabled ports add a frame tag of 4 bytes resulting in a misleading Too Long Frame error

These frames are passed successfully but will create the Too Long Frame error message

If you want to eliminate the error message reduce your Ethernet packet frames by 4 bytes

Device View

Device View allows you to display a variety of port and device-level statistics relevant to Ethernet packet loss Table 20 describes these statistics and their use in troubleshooting

Table 20 Activity and Error Statistics in Device View Statistics Group

Description Use in Troubleshooting

Activity Displays the total network activity and errors on the selected port

This data shows readable packets broadcast packets Collisions total errors and runts which cause Too Short Errors You can interpret this data in the following ways

bull The presence of runts can often be caused by Collisions however if the values increase at specific times of the day it may indicate you need to change the network topology to manage the traffic more efficiently (for example with switches or routers)

bull Runts can also be caused by a badly terminated coax cable

bull Large numbers of runts not associated with high levels of collisions can indicate a transmission problem (examine the cable)

bull Particularly high numbers of Collisions compared to the

63 45

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 46: 11591337-Basic-Network-Troubleshooting

total number of readable packets can point to a hardware problem (a bad adapter) or to a data loop

bull A high proportion of Broadcast Packets (gt10) on a heavily utilized network (gt50 of available bandwidth) can point to an incorrectly configured bridge or router on the network

Errors Displays the number of frames with errors on the selected port

The significance of errors depends on accompanying errors and prevailing network conditions See the following error data for more information

bull Alignment Errors Table 16 bull FCS Errors Table 16 bull Too Long Errors Table 19 bull Too Short Errors or runts Table 19

bull Late Collisions Table 18

To display Activity and Errors statistics for a device or port follow these steps

1 Select the required port or device

2 From the shortcut menu select Activity or Errors

The statistics available depend on the type of port or device selected See Table 20 for troubleshooting information

Ethernet Packet Loss Reference

This section explains terms that are relevant to Ethernet packet loss and provides additional conceptual and problem analysis detail

Alignment Errors

An Alignment Error indicates a received frame in which both are true

bull The number of bits received is an uneven byte count (that is not an integral multiple of 8)

bull The frame has a Frame Check Sequence (FCS) error

Alignment Errors often result from MAC layer packet formation problems cabling problems that cause corrupted or lost data and packets that pass through more than two cascaded multiport transceivers See FCS Errors for more information about interpreting Alignment Errors

63 46

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 47: 11591337-Basic-Network-Troubleshooting

Collisions

Collisions indicate that two devices detect that the network is idle and try to send packets at exactly the same time (within one round-trip delay) Because only one device can transmit at a time both devices must stop sending and attempt to retransmit Collisions are detected by the transmitting stations

The retransmission algorithm helps to ensure that the packets do not retransmit at the same time However if the two devices retry at nearly the same time packets can collide again the process repeats until either the packets finally pass onto the network without collisions or 16 consecutive collisions occur and the packets are discarded

CRC Errors

A Cyclic Redundancy Check (CRC) Error is an RMON statistic that combines FCS Errors and Alignment Errors These errors indicate that packets were received with

bull A bad FCS and an integral number of octets (FCS Errors) bull A bad FCS and a non-integral number of octets (Alignment Errors)

CRC Errors can cause an end station to freeze If a large number of CRC Errors are attributed to a single station on the network replace the stations network interface board Typically a CRC Error rate of more than 1 percent of network traffic is considered excessive

Excessive Collisions

Excessive Collisions indicate that 16 consecutive collisions have occurred usually a sign that the network is becoming congested For each excessive collision count (or after 16 consecutive collisions) a packet is dropped If you know the normal rate of excessive collisions then you can determine when the rate of packet loss is affecting your networks performance See Knowing Your Networks Configuration for more information

FCS Errors

Frame Check Sequence (FCS) Errors a type of CRC indicate that frames received by an interface are an integral number of octets long but do not pass the FCS check The FCS is a mathematical way to ensure that all the frames bits are correct without having the system examine each bit and compare it to the original Packets with Alignment Errors also generate FCS Errors

Both Alignment Errors and FCS Errors can be caused by equipment powering up or down or by interference (noise) on unshielded twisted-pair (10BASE-T) segments In a network that complies with the Ethernet standard FCS or Alignment Errors indicate bit errors during a

63 47

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 48: 11591337-Basic-Network-Troubleshooting

transmission or reception A very low rate is acceptable Although Ethernet allows a 1 in 108 bit error rate typical Ethernet performance is 1 in 1012 or better

Late Collisions

Late Collisions indicate that two devices have transmitted at the same time but cabling errors (most commonly excessive network segment length or repeaters between devices) prevent either transmitting device from detecting a collision Neither device detects a collision because the time to propagate the signal from one end of the network to the other is longer than the time to put the entire packet on the network As a result neither of the devices that cause the late collision senses the others transmission until the entire packet is on the network

Although late collisions occur for small packets the transmitter cannot detect them As a result a network suffering measurable Late Collisions for large packets is losing small packets as well

Nonstandard Ethernet Problems

Table 21 lists the symptoms that typically occur if a system violates the Ethernet standard

Table 21 Symptoms of Common Ethernet Network Problems Symptoms Problem Notes

FCS Errors and Alignment Errors increase significantly

Network cabling is too long If you use a promiscuous network monitor the number of Late Collisions reported by stations should correlate with the FCS and Alignment Errors reported by the monitor

FCS and Alignment Errors increase proportionally with interference (sometimes referred to as noise hits)

Network segment is noisy

Typically observed on a 10BASE-T network segment in a noisy environment If you use multiple promiscuous monitors the FCS and Alignment Errors among the monitors will not correlate

If the monitor can track runts also called Too Short Errors the number of runt packets should be significantly higher than normal

FCS and Alignment Errors are much higher than normal

Networks do not conform to the access scheme of Carrier Sense Multiple Access with Collision Detect (CSMACD)

Occurs when some implementations of Ethernet in the segment are not entirely compatible with IEEE 8023 repeaters

Collision fragments linger on the network long enough to collide with retry packets at the minimum interpacket gap (IPG) The IPG is smaller on one side of the repeated network causing a lost packet

Ethernet controllers cannot receive packets that are separated by 47 micros or less Some controllers cannot sustain receptions of packets separated by as much as 96 micros If runt

63 48

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 49: 11591337-Basic-Network-Troubleshooting

packets are received one after another and are followed by a collision fragment Ethernet controllers that cannot sustain reception will lose packets

Receive Discards

Receive Discards indicate that received packets could not be delivered to a high-layer protocol because of congestion or packet errors

Too Long Errors

A Too Long Error indicates that a packet is longer than 1518 octets (including FCS octets) but otherwise well formed Too Long Errors are often caused by a bad transceiver a malfunction of the jabber protection mechanism on a transceiver or excessive noise on the cable

Too Short Errors

A Too Short Error also called a runt indicates that a packet is fewer than 64 octets long (including FCS octets) but otherwise well formed

Transmit Discards

Transmit Discards indicate that packets were not transmitted because of network congestion

FDDI Ring Errors

Use these sections to identify and correct FDDI ring errors

bull FDDI Ring Errors Overview bull Identifying Ring Errors

Fiber Distributed Data Interface (FDDI) often corrects its own problems However because FDDI cannot correct all errors (especially those related to hardware problems) you should monitor FDDI errors

Understanding the Problem

FDDI ring errors that you should monitor include

bull Elasticity Buffer Error Condition bull Frame Error Condition

63 49

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 50: 11591337-Basic-Network-Troubleshooting

bull Frames Not Copied Condition bull Link Error Condition

Identifying the Problem

First determine the type of FDDI ring errors and where they are occurring Similar to the way you identify other FDDI problems identify the upstream and downstream neighbors of the devices that you are monitoring

Several types of network errors can cause FDDI performance problems For example problems with cables or physical connections may result in a link or frame error Elasticity buffer (EB) errors can also lead to link and frame errors

FDDI deals with port-related errors as follows

bull The variable PORTlerAlarm is the link error rate (LER) value at which a link connection generates an alarm When the LER is greater than the alarm setting Station Management (SMT) sends a Status Report Frame (SRF) to notify you that there is a problem with a port

The PORTlerAlarm threshold is set lower than the PORTlerCutoff threshold so that you are notified of a problem before the port is actually removed from the ring

bull When link errors reach the threshold defined by the variable PORTLERCutoff SMT breaks the connection disabling the PHY that detected the problem A Link Error Condition is also generated

FDDI deals with MAC-related errors as follows

bull When MAC frame errors reach a certain threshold a Frame Error Condition is generated Because the actual error can be further upstream than the immediate connection the connection remains intact

bull For a large network the worst case MACFrameErrorRatio is less than 01 percent However during network configuration frame error ratios can reach 50 percent for short periods When you detect a sustained frame error ratio of more than 01 percent a problem exists between the station that is reporting the condition and the nearest upstream MAC

Solving the Problem

To solve problems related to FDDI errors fix the hardware cabling or congestion problem

Identifying Ring Errors

63 50

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 51: 11591337-Basic-Network-Troubleshooting

Use Status Watch to monitor your FDDI devices for Warning or Critical alerts

Status Watch

Use Status Watch to identify FDDI ring errors

Follow these steps

1 Monitor the FDDI Status tool for the currently selected device

2 Determine whether Status Watch is reporting Elasticity Buffer Errors or a high percentage of Frame Errors Frames Not Copied or Link Error Rates for the currently selected device

FDDI Ring Errors Reference

This section provides additional conceptual and problem analysis detail

Elasticity Buffer Error Condition

The Elasticity Buffer Error condition occurs when a ports elasticity buffer overflows or underflows This condition usually indicates that a ports hardware is not operating within the tolerances that the FDDI standard specifies Look for the problem in the hardware of either the port that is reporting the condition or of the immediately adjacent port

Frame Error Condition

The Frame Error condition occurs when the percentage of frames that contain errors exceeds a preset threshold In the situation when a device is an uplink to FDDI (that is a device is transmitting onto FDDI) this type of condition indicates that the ring is saturated The ring is out of buffer space and packets are being dropped from the devices backbone port

The problem indicated by the frame errors is usually located between the MAC that reports the condition and its upstream neighbor Because many physical connections can lie along this path the MACFrameErrorRatio variable identifies only the two MACs between which the problem is occurring

Frames Not Copied Condition

The Frames Not Copied condition occurs when the percentage of frames that are dropped because of insufficient buffer space exceeds a preset threshold This condition indicates that the station is congested and is unable to process frames as quickly as they arrive To help eliminate congestion

63 51

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 52: 11591337-Basic-Network-Troubleshooting

bull Add more capacity to the station bull Reconfigure your network so that end stations that communicate heavily with

one another are on the same bridge or switch bull Filter out certain traffic

Link Error Condition

The Link Error condition occurs when a port detects link errors at a rate that exceeds a preset threshold When the Link Error threshold is exceeded the station removes itself from the ring and tries to reinsert itself on the ring This action creates a MAC Neighbor Change Event (which also occurs if a ring wraps)

Link errors may indicate an FDDI PHY hardware problem (such as a faulty transmitter) or a faulty cable or connector Look for the problem in the portion of the network between the port that is reporting the condition and the first upstream transmitter

MAC Neighbor Change Event

The MAC Neighbor Change event occurs when a MACs upstream or downstream neighbor changes

This event indicates either

bull A network reconfiguration bull Another station that is leaving or joining the ring

Network File Server Timeouts

Use these sections to identify and correct timeouts on network file servers

bull Network File Server Timeout Overview bull Looking for Obvious Errors bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

See Network File Server Timeouts Reference for additional conceptual and problem analysis detail

A network file server can time out if your network gets congested or if your server is having problems Users might have problems downloading data from or to the server or copying files from or to the server To help you to understand the troubleshooting process for this type of problem an EXAMPLE throughout this section follows the symptoms analysis and resolution of a typical file server timeout problem

63 52

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 53: 11591337-Basic-Network-Troubleshooting

Understanding the Problem

When users log in their stations make network file server calls either to determine quotas (if this feature has been enabled) or to mount user home directories The network file server timeout messages even when spread across multiple nodes indicate a problem either with the network or with a server

EXAMPLE UNIX users notice that it takes a long time - over 30 seconds in some cases - to log in to any machine Some machines report network file server timeout messages but the messages have no obvious pattern and are infrequent You begin to get a sense of the problem

Identifying the Problem

First rule out the obvious causes Ask these questions

bull Can you access the network file server with Telnet bull Have any alarms been triggered bull Are there any new errors

The process of identifying the problem is developed in Looking for Obvious Errors

Solving the Problem

To determine the cause reproduce the fault while you monitor the network After you know the cause you can fix the problem

The solutions to the network file server timeout are identified in these sections

bull Reproducing the Fault While Monitoring the Network bull Correcting the Fault

Looking for Obvious Errors

To look for obvious errors use these applications

bull Ping and Telnet - To determine for connectivity to the network file server nodes

bull LANsentry Manager Alarms View - To search for triggered alarms bull LANsentry Manager Statistics View - To look for errors bull LANsentry Manager History View - To identify for trends

63 53

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 54: 11591337-Basic-Network-Troubleshooting

Ping and Telnet

Determine whether you can contact network file server nodes using Ping and Telnet If the response is extremely slow then a problem may exist with the connections to the nodes No delay indicates that the connections are normal implying that the delay is occurring elsewhere In this case use LANsentryreg Manager tools to determine whether packets are being lost or ignored

LANsentry Manager Alarms View

Using the LANsentry Manager Alarms View you can determine if any configured alarms have been triggered

Search the Alarms View to see if any MAC events have been logged

EXAMPLE MAC events have not been logged for the network on which the UNIX users are attached

Even though no alarms have occurred errors may exist For example a lower rate of background errors may exist just below the alarm threshold Based on maximum and minimum values RMON errors may miss constant periodic or low amounts of errors

LANsentry Manager Statistics View

Using the LANsentry Manager Statistics View you can display a multisegment graph of utilization and error statistics

Follow these steps

1 Set up a graph that shows utilization and errors on all your major segments

2 Determine whether any segments are particularly busy or error prone

EXAMPLE You notice that one segment of the UNIX network HUB3 is reporting Too Long Errors and FCS Errors roughly every second sample While the amount is not higher than normal it is currently higher than any other segment

LANsentry Manager History View

Using the LANsentry Manager History View you can display a rolling history table to determine if the errors that you are seeing are new For example if you have a history table that runs for 30-minute samples over two days you can compare the most recent sample to a previous sample

63 54

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 55: 11591337-Basic-Network-Troubleshooting

looking for new errors If your probe has the resources use a much finer resolution sample stored for a shorter time (every 30 seconds for 2 hours) to more easily spot recent errors

EXAMPLE You see that the history table shows that no error rates remained constant throughout the day However errors that did occur were on the device HUB3 and were Too Long Errors and FCS Errors

Reproducing the Fault While Monitoring the Network

Although the RMON View in LANsentry Manager can show error rates and help you to identify the location of the problem it may not provide enough data to solve the problem To determine the cause of the problem reproduce it while you monitor the network by using these applications

bull LANsentry Manager Top-N Graph - To locate a quiet node to use for reproducing the fault

bull LANsentry Manager Packet Capture - To capture packets from the hub to which the quiet node is attached

bull LANsentry Manager Packet Decode - To analyze the packets to assess network file server traffic and delays

bull Address Tracker - To find the location of the problem nodes

EXAMPLE Using LANsentry Manager you find a hub on the network with a higher than normal error rate However the error rate does not seem high enough to cause login delays of 60 seconds or more

LANsentry Manager Top-N Graph

Using the Top-N graph in the LANsentry Manager main window locate a quiet node that has been showing the same problem Choose a quiet node so that you do not receive excessive traffic when you try to isolate the problem

EXAMPLE You see that the node Monolith which has the same Network File System (NFS) mounts as the other nodes on the network is quiet You decide to use this node for reproducing the fault See Network File System (NFS) Protocol for more information about NFS

LANsentry Manager Packet Capture

Using the LANsentry Manager Packet Capture application capture packets from the network using predefined patterns and start-and-stop conditions

63 55

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 56: 11591337-Basic-Network-Troubleshooting

Follow these steps

1 Set up a capture buffer on a probe that is connected to the same hub as the quiet node Until you know more about the problem set a very general filter

EXAMPLE You select a MAC-layer filter and set a conversation filter to capture all packets to and from Monolith

2 Telnet into and log out from the quiet node Then reset the capture buffer Repeat this procedure until you see the problem reflected in your captured data To keep the buffer information clean reset the buffer each time that you repeat the procedure

3 When you see the delay note the rough value of the packet count on the LANsentry Manager packet buffer

By noting the packet count at which you think the delay has occurred you can narrow the problem to within about 20 packets in the buffer If you have used an extremely quiet node you may even identify the exact packet

LANsentry Manager Packet Decode

The LANsentry Manager Packet Decode application decodes all major protocols and displays the packet contents at three levels of detail summary information header information and actual packet content

Follow these steps

1 Open the buffer in the Packet Decode application and locate the number of the packet at which the delay occurred

2 Select the packet and launch a MAC-layer conversation filter In the filter display look for a gap in the conversation (that is where the node sent a request and then resent it at approximately the same rate as the delay you experienced when recreating the problem)

3 Repeat the test to determine if the result concentrates on one node or if it appears on other nodes

EXAMPLE On the quiet node that you selected the delay is obvious You see an NFS request going out to a node and a repeat of the request 30 seconds later During that time the node did not respond You now know that the delay occurred because nodes were not seeing responses for NFS requests When you repeat the test on other nodes you find that the delay is happening with more than one destination node

63 56

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 57: 11591337-Basic-Network-Troubleshooting

Address Tracker

Use Address Tracker which polls managed devices to determine the hubs to which the problem nodes are attached If the problem end stations are located on unmanaged devices then you can at least narrow the problem to those unmanaged devices

EXAMPLE Although your network does not have managed hubs that Transcendreg NCS management software can poll it does have managed switches When it polls the switches Address Tracker displays the switch ports on which addresses were last seen This information indicates the hub (but not the hub port) on which the device is located

LANsentry Manager Packet Decode

After you know the location of the hub that has the problem node monitor the problem from the hub using LANsentry Manager Packet Decode

Follow these steps

1 To capture packets from one of the nodes on the hub set up another capture buffer and repeat the exercise that is described in LANsentry Manager Packet Capture Because a delay may occur on a different node use two capture buffers without stopping the first one

Note the rough packet count where the delay appears

2 Display a conversation filter of the packet where the delay appears and look for the gap in the conversation

EXAMPLE You hope that the nodes are on the same hub You find that all the nodes are on HUB3 This result indicates that FCS Errors may be causing the timeouts However because the errors occur at a low rate you decide to verify this diagnosis You monitor the problem from the hub logging in and out many times and the delay eventually occurs This time the delay shows that the nodes reply had an FCS Error even though the node received the request The switch would not have transmitted this packet causing a timeout on the NFS protocol The retry time is presumably 30 seconds During this test you see the problem occurring on another node

Correcting the Fault

63 57

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 58: 11591337-Basic-Network-Troubleshooting

Without a managed hub you may find it very difficult to discover network file server timeout errors To find the problematic node you must either systematically isolate nodes by monitoring each node for a prolonged period or temporarily insert a managed hub

EXAMPLE You notice that the captured error packet failed FCS because it was corrupted by a regular pattern during transmission A possible reason for this occurrence is a Jabbering node This explanation makes sense because FCSJabber frames increased linearly when you were monitoring the live network

Network File Server Timeouts Reference

This section explains terms that are relevant to network file server timeouts and provides additional conceptual and problem analysis detail

Jabbering

When a node transmits illegal length packets and is possibly not operating within carrier specifications In effect another node has written bad data over a valid packet This bad data is often interpreted as a repeated sequence of data

Network File System (NFS) Protocol

A distributed file system protocol developed by Sun Microsystems that allows a computer system to access files over a network as if they were on its local disks This protocol has been incorporated into products by more than 200 companies It is now a de facto Internet standard

NFS is one protocol in the NFS suite of protocols which includes NFS RPC XDR (External Data Representation) and others These protocols are part of a larger architecture that Sun Microsystems refers to as Open Network Computing (ONC) ONC is a distributed applications architecture designed by Sun and currently controlled by a consortium led by Sun

Troubleshooting TCPIP - Detailed Steps

This article shows how to troubleshoot TCPIP connectivity between computers on a Windows network If you havenrsquot already done so disable XPrsquos Internet Connection Firewall on all local area network connections and remove all firewall programs on the network Improperly configured firewalls are the most common cause of TCPIP problems

63 58

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 59: 11591337-Basic-Network-Troubleshooting

Open a Command Prompt Window

For many of these steps yoursquoll be typing at the command prompt To open a command prompt window in Windows 2000 or XP click Start | Run type cmd in the box and click OK To open a command prompt window in Windows 95 98 or Me click Start | Run type command in the box and click OK Type one command per line and press Enter after each one to execute it To close the command prompt window use the exit command

Determine the TCPIP Settings

Determine the TCPIP settings of each computer on the local area network In XP open the Network Connections folder right click the LAN connection and click Status | Support | Details For example here are the Status and Details views for the LAN connection on an Internet Connection Sharing host

In Windows 9598Me click Start | Run type winipcfg in the box and click OK Select the LAN adapter from the menu and click More Info Herersquos the winipcfg view for an ICS client running Windows Me

You can also see the TCPIP settings from the command prompt This is especially convenient if a computer has more than one network adapter Use the ipconfig all command which is available in all versions except Windows 95 The output from this command can be long so itrsquos best to write it to a file Specify the file name in the command this way

ipconfig all gtipconfigtxt

Description of TCPIP Settings

Here are the TCPIP settings that are used in network troubleshooting

bull IP Address ndash Unique address assigned to a network adapter A computer with multiple network adapters has an IP address for each one and each one must be in a different subnet

bull Subnet Mask ndash Used in conjunction with the IP address to determine which subnet an adapter belongs to At the simplest level communication is only possible between two network adapters when theyrsquore in the same subnet

bull Default Gateway - IP address of a computer or router on one of this computerrsquos local area networks that knows how to communicate with subnets not present on this computer For an Internet connection the default gateway is a router belonging to your Internet service provider and all access to sites on the Internet goes

63 59

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 60: 11591337-Basic-Network-Troubleshooting

through it For an ICS client the default gateway is the ICS host If you use a hardware router it serves as the default gateway

bull DHCP Server ndash If an adapter is configured to obtain an IP address automatically this is the address of the server that provides it It could be your ISP an ICS host or a hardware router

bull DNS Servers ndash IP address of one or more Domain Name Server computers DNS servers translate Internet names to their IP addresses (like 63146109227)

Subnets

See our article on subnets for a brief description of how they work For more details see this Microsoft Knowledge Base article

If two computers are supposed to be on the same subnet but arenrsquot something is wrong with the network hardware or software configuration This is most likely to happen when one of them receives an IP address of 169254xx which indicates that

bull Itrsquos configured to obtain an IP address automatically bull It couldnrsquot find a DHPC server on the network to make the assignment bull Windows assigned it an Automatic Private IP Address

See our article on Specific Networking Problems and Their Solutions for more information

Pinging

The ping command is the basic tool for testing TCPIP connectivity It sends a special packet (called ICMP Echo) to a particular IP address and looks for a reply If everything is working right the reply comes back If not the ping times out in a few seconds By default the ping command repeats the process four times Herersquos an example of an ICS client computer pinging a Windows XP Home Edition ICS host using the hostrsquos IP address and its computer name

When ping fails yoursquoll see one of these error messages

bull Request timed out - The IP address is valid but therersquos no reply from it If the IP address is on a local area network the most likely cause is a firewall program blocking the ping

bull Unknown host ltnamegt or Ping request could not find host ltnamegt - The computer name doesnrsquot exist on the local area network Make sure that NetBIOS over TCPIP is enabled

bull Destination host unreachable ndash The IP address isnrsquot on a local area network and the default gateway canrsquot access it Either therersquos no default gateway its address is wrong or it isnrsquot functioning

63 60

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 61: 11591337-Basic-Network-Troubleshooting

Pinging the Local Area Network

Here is a series of ping commands to use in finding where a problem occurs on a local area network Run them in the order shown and donrsquot go on to the next command until all of the previous commands work properly In this example

bull The computer being tested is named Winxp with IP address 1921681101 bull Therersquos another computer on the network named Win98 with IP address

1921681123

Substitute the appropriate IP addresses and computer names for your network

Command Target What Ping Failure

Indicates

ping 127001 Loopback address Corrupted TCPIP

installation

ping localhost Loopback name Corrupted TCPIP

installation

ping 1921681101 This computerrsquos IP address

Corrupted TCPIP

installation

ping winxp This computerrsquos name

Corrupted TCPIP

installation

ping 1921681123 Another computerrsquos IP

address

Bad hardware

or NIC driver

ping win98 Another computerrsquos name

NetBIOS name

resolution failure

To fix a corrupted TCPIP Installation on Windows XP follow the steps in this Microsoft Knowledge Base article For Windows 9598Me un-install the TCPIP protocol in Control Panel | Network reboot and re-

install it If that doesnrsquot fix it use this procedure on Windows 95 or 98

Pinging the Internet

You can also use ping to find a problem with Internet access Run these commands in the order shown and donrsquot go on to the next command until all of the previous commands work properly Use the Default

Gateway and DNS Server addresses that you got from the winipcfg or ipconfig all command

63 61

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 62: 11591337-Basic-Network-Troubleshooting

CommandTargetWhat Ping Failure Indicatesping wxyzDefault GatewayDefault Gateway downping wxyzDNS ServerDNS Server downping wxyzWeb site IP

addressInternet service provider or web site downping wwwsomethingcomWeb site nameDNS Server down or web site down

REFRENCEWWWPRACTICALLYNETWORKCOM

WWWCOMPUTERHOPECOM

WWWWIKIPEDIAORG

63 62

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet
Page 63: 11591337-Basic-Network-Troubleshooting

THANK YOU

63 63

  • Basic Network Troubleshooting
  • Basic network troubleshooting
    • Issue
    • Cause
    • Solution
      • Verify connections LEDs
      • Adapter resources
      • Adapter functionality
      • Protocol
      • Firewall
      • Additional time
      • Additional troubleshooting
        • Network Troubleshooting Overview
          • About Connectivity Problems
          • About Performance Problems
          • Solving Connectivity and Performance Problems
          • Recognizing Symptoms
            • User Comments
            • Network Management Software Alerts
            • Analyzing Symptoms
              • Understanding the Problem
              • Identifying and Testing the Cause of the Problem
                • Sample Problem Analysis
                • Equipment for Testing
                  • Solving the Problem
                    • Your Network Troubleshooting Toolbox
                      • Transcend Central
                      • Status Watch
                        • Web Reporter
                          • Address Tracker
                          • LANsentry Manager
                          • Traffix Manager
                          • Device View
                          • Ping
                            • Strategies for Using Ping
                            • Tips on Interpreting Ping Messages
                              • Telnet
                              • FTP and TFTP
                              • Analyzers
                              • Probes
                              • Cable Testers
                              • Knowing Your Networks Configuration
                                • Site Network Map
                                • Device Configuration Information
                                • Other Important Data About Your Network
                                  • Identifying Your Networks Normal Behavior
                                    • Baselining Your Network
                                    • Identifying Background Noise
                                    • Identifying Background Noise
                                        • FDDI Connectivity
                                          • Understanding the Problem
                                          • Identifying the Problem
                                          • Solving the Problem
                                          • Status Watch
                                          • Implementing Dual Homing
                                          • Installing an Optical Bypass Unit
                                          • Peer Wrap Condition
                                          • Twisted Ring Condition
                                          • Undesired Connection Attempt Event
                                          • Address Tracker
                                          • Device View
                                          • Broadcast Packets
                                          • Multicast Packets
                                            • Duplicate Addresses
                                              • Understanding the Problem
                                              • Identifying the Problem
                                              • Solving the Problem
                                              • Status Watch
                                              • Address Tracker
                                              • LANsentry Manager
                                              • Duplicate MAC Addresses
                                              • Duplicate IP Addresses
                                                • Ethernet Packet Loss
                                                  • Understanding the Problem
                                                  • Identifying the Problem
                                                  • Solving the Problem
                                                  • Status Watch
                                                  • LANsentry Manager Network Statistics Graph
                                                  • Device View
                                                  • Alignment Errors
                                                  • Collisions
                                                  • CRC Errors
                                                  • Excessive Collisions
                                                  • FCS Errors
                                                  • Late Collisions
                                                  • Nonstandard Ethernet Problems
                                                  • Receive Discards
                                                  • Too Long Errors
                                                  • Too Short Errors
                                                  • Transmit Discards
                                                    • FDDI Ring Errors
                                                      • Understanding the Problem
                                                      • Identifying the Problem
                                                      • Solving the Problem
                                                      • Status Watch
                                                      • Elasticity Buffer Error Condition
                                                      • Frame Error Condition
                                                      • Frames Not Copied Condition
                                                      • Link Error Condition
                                                      • MAC Neighbor Change Event
                                                        • Network File Server Timeouts
                                                          • Understanding the Problem
                                                          • Identifying the Problem
                                                          • Solving the Problem
                                                          • Ping and Telnet
                                                          • LANsentry Manager Alarms View
                                                          • LANsentry Manager Statistics View
                                                          • LANsentry Manager History View
                                                          • LANsentry Manager Top-N Graph
                                                          • LANsentry Manager Packet Capture
                                                          • LANsentry Manager Packet Decode
                                                          • Address Tracker
                                                          • LANsentry Manager Packet Decode
                                                          • Jabbering
                                                          • Network File System (NFS) Protocol
                                                          • Troubleshooting TCPIP - Detailed Steps
                                                          • Open a Command Prompt Window
                                                          • Determine the TCPIP Settings
                                                          • Description of TCPIP Settings
                                                          • Subnets
                                                          • Pinging
                                                          • Pinging the Local Area Network
                                                          • Pinging the Internet