fault management
Post on 31-Dec-2015
26 Views
Preview:
DESCRIPTION
TRANSCRIPT
Fault Management
IACT 918 July 2004
Gene Awyzio
SITACS University of Wollongong
2
Overview
• Fault Management is the process of locating and correcting network problems or faults
• Comprehensive fault management is probably the most important task in Network Management
3
Benefits of Fault Management Process• Increased network reliability
– Provides tools allowing engineer to quickly
• Detect problems
• Initiate recovery procedures
• Need to maintain the illusion of complete and continuous connectivity
• Also provides tools to extract information about the networks current state
4
Accomplishing Fault Management• Can be considered as a three (3) step
process
– Identify the fault
– Isolate the cause of the fault
– Correct the fault if possible
5
Identifying the fault
• Gathering Information to identify a problem
– To learn that a problem exists we need to gather data about the current state of the network
• Two approaches
– Log critical network events
– Poll network devices
6
Identifying the fault
• Critical network events
– Examples
• Failure of a link
• Lack of response from host
– Transmitted by network device when fault conditions occur
– Reactive method
– If device fails it cannot send an event
7
Identifying the fault• Occasional Polling
– Can help find faults in a timely manner– Tradeoff
• Degree of timeliness vs bandwidth consumption– Other factors
• Number of devices to poll• Bandwidth of links
– Example• Assume each query and response is 100 bytes long (including
data and header information)• For a network of 30 devices
– (100 + 100) * 30 = 6000bytes/polling interval = 48,000 bits/polling interval
• Polling every minute– 800 bits/second– (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000 = 173
Megabits/hour• Polling every 10 minutes
– 17.3 Megabits/hour– May not know about event for 10 minutes
8
Deciding Which Faults to Manage• Need to decide which faults to mange
– Need to prioritise faults
– If number of faults reports is high network may not handle volume
– Limiting event traffic can reduce redundant transmissions and storage
• Factors to consider
– Scope of control over network
– Size of network
9
Fault Management of a Network Management System• Simplest system
– Reports existence of fault but NOT location
• More complex tool
– Uses capability of hosts and network devices to
• Send critical network events
• Facilitate isolation of fault cause
• Advanced tool
– Correction of fault
10
Impact of a Fault on the Network• A fault management tool MUST be capable of
analysing how a fault can affect other areas of the network
• Need to know– What services the fault
• STOPS
• IMPACTS
– Not only that a fault has occurred but also how that fault affects other network communication
• Data can come from performance management tools
11
Form of Reporting Faults• Common forms of fault reporting
– Text
– Graphical
– Auditory signals• Text
– Will work on any type of terminal• Graphical
– Considered to be very effective
– Can use flashing images to gain attention
– Colour can be used to indicate device status• Auditory signals
– Will quickly call attention to the occurrence of a fault
12
top related