Download - Fault Management

Transcript
Page 1: Fault Management

Fault Management

IACT 918 July 2004

Gene Awyzio

SITACS University of Wollongong

Page 2: Fault Management

2

Overview

• Fault Management is the process of locating and correcting network problems or faults

• Comprehensive fault management is probably the most important task in Network Management

Page 3: Fault Management

3

Benefits of Fault Management Process• Increased network reliability

– Provides tools allowing engineer to quickly

• Detect problems

• Initiate recovery procedures

• Need to maintain the illusion of complete and continuous connectivity

• Also provides tools to extract information about the networks current state

Page 4: Fault Management

4

Accomplishing Fault Management• Can be considered as a three (3) step

process

– Identify the fault

– Isolate the cause of the fault

– Correct the fault if possible

Page 5: Fault Management

5

Identifying the fault

• Gathering Information to identify a problem

– To learn that a problem exists we need to gather data about the current state of the network

• Two approaches

– Log critical network events

– Poll network devices

Page 6: Fault Management

6

Identifying the fault

• Critical network events

– Examples

• Failure of a link

• Lack of response from host

– Transmitted by network device when fault conditions occur

– Reactive method

– If device fails it cannot send an event

Page 7: Fault Management

7

Identifying the fault• Occasional Polling

– Can help find faults in a timely manner– Tradeoff

• Degree of timeliness vs bandwidth consumption– Other factors

• Number of devices to poll• Bandwidth of links

– Example• Assume each query and response is 100 bytes long (including

data and header information)• For a network of 30 devices

– (100 + 100) * 30 = 6000bytes/polling interval = 48,000 bits/polling interval

• Polling every minute– 800 bits/second– (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000 = 173

Megabits/hour• Polling every 10 minutes

– 17.3 Megabits/hour– May not know about event for 10 minutes

Page 8: Fault Management

8

Deciding Which Faults to Manage• Need to decide which faults to mange

– Need to prioritise faults

– If number of faults reports is high network may not handle volume

– Limiting event traffic can reduce redundant transmissions and storage

• Factors to consider

– Scope of control over network

– Size of network

Page 9: Fault Management

9

Fault Management of a Network Management System• Simplest system

– Reports existence of fault but NOT location

• More complex tool

– Uses capability of hosts and network devices to

• Send critical network events

• Facilitate isolation of fault cause

• Advanced tool

– Correction of fault

Page 10: Fault Management

10

Impact of a Fault on the Network• A fault management tool MUST be capable of

analysing how a fault can affect other areas of the network

• Need to know– What services the fault

• STOPS

• IMPACTS

– Not only that a fault has occurred but also how that fault affects other network communication

• Data can come from performance management tools

Page 11: Fault Management

11

Form of Reporting Faults• Common forms of fault reporting

– Text

– Graphical

– Auditory signals• Text

– Will work on any type of terminal• Graphical

– Considered to be very effective

– Can use flashing images to gain attention

– Colour can be used to indicate device status• Auditory signals

– Will quickly call attention to the occurrence of a fault

Page 12: Fault Management

12


Top Related