fault management

12
Fault Management IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong

Upload: jamalia-langley

Post on 31-Dec-2015

26 views

Category:

Documents


0 download

DESCRIPTION

Fault Management. IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong. Overview. Fault Management is the process of locating and correcting network problems or faults Comprehensive fault management is probably the most important task in Network Management. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fault Management

Fault Management

IACT 918 July 2004

Gene Awyzio

SITACS University of Wollongong

Page 2: Fault Management

2

Overview

• Fault Management is the process of locating and correcting network problems or faults

• Comprehensive fault management is probably the most important task in Network Management

Page 3: Fault Management

3

Benefits of Fault Management Process• Increased network reliability

– Provides tools allowing engineer to quickly

• Detect problems

• Initiate recovery procedures

• Need to maintain the illusion of complete and continuous connectivity

• Also provides tools to extract information about the networks current state

Page 4: Fault Management

4

Accomplishing Fault Management• Can be considered as a three (3) step

process

– Identify the fault

– Isolate the cause of the fault

– Correct the fault if possible

Page 5: Fault Management

5

Identifying the fault

• Gathering Information to identify a problem

– To learn that a problem exists we need to gather data about the current state of the network

• Two approaches

– Log critical network events

– Poll network devices

Page 6: Fault Management

6

Identifying the fault

• Critical network events

– Examples

• Failure of a link

• Lack of response from host

– Transmitted by network device when fault conditions occur

– Reactive method

– If device fails it cannot send an event

Page 7: Fault Management

7

Identifying the fault• Occasional Polling

– Can help find faults in a timely manner– Tradeoff

• Degree of timeliness vs bandwidth consumption– Other factors

• Number of devices to poll• Bandwidth of links

– Example• Assume each query and response is 100 bytes long (including

data and header information)• For a network of 30 devices

– (100 + 100) * 30 = 6000bytes/polling interval = 48,000 bits/polling interval

• Polling every minute– 800 bits/second– (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000 = 173

Megabits/hour• Polling every 10 minutes

– 17.3 Megabits/hour– May not know about event for 10 minutes

Page 8: Fault Management

8

Deciding Which Faults to Manage• Need to decide which faults to mange

– Need to prioritise faults

– If number of faults reports is high network may not handle volume

– Limiting event traffic can reduce redundant transmissions and storage

• Factors to consider

– Scope of control over network

– Size of network

Page 9: Fault Management

9

Fault Management of a Network Management System• Simplest system

– Reports existence of fault but NOT location

• More complex tool

– Uses capability of hosts and network devices to

• Send critical network events

• Facilitate isolation of fault cause

• Advanced tool

– Correction of fault

Page 10: Fault Management

10

Impact of a Fault on the Network• A fault management tool MUST be capable of

analysing how a fault can affect other areas of the network

• Need to know– What services the fault

• STOPS

• IMPACTS

– Not only that a fault has occurred but also how that fault affects other network communication

• Data can come from performance management tools

Page 11: Fault Management

11

Form of Reporting Faults• Common forms of fault reporting

– Text

– Graphical

– Auditory signals• Text

– Will work on any type of terminal• Graphical

– Considered to be very effective

– Can use flashing images to gain attention

– Colour can be used to indicate device status• Auditory signals

– Will quickly call attention to the occurrence of a fault

Page 12: Fault Management

12