network management

Network Management

Lecture 4

Performance ManagementThe practice of optimizing network service response time.

It also entails managing the consistency and quality of individual and overall network services.

The most important service is the need to measure the user/application response time.

For most users, response time is the critical performance success factor. This variable will shape the perception of network success by both your users and application administrators. (cisco)

What is Performance Management

Quantification of performance indicators on Server Network Workstation Applications

Standard performance goals are: Response time Utilization Throughput Capacity

In Performance Management

Need toMaintain continuous indicator for

performance evaluationVerify what levels of service need to be

maintained Identify actual and potential bottlenecksEstablish and report on usage trends

Objectives of Performance Management

Need to ensure that network highway remains accessible and not crowded

Provide a consistent level of service

Avoid degradation of performance

Provide proactiveproactive management

Performance Indicators Required

Transmission capacityExpressed in bits per second

Signal Propagation delayTime required to transmit signal to its

destinationLonger the propagation, the longer the delay


TopologyStar, Tree, Ring, BUS or Combination of

Star and RingWould limit number of workstations or

hosts per cable segment which can be attached to the network

The higher the number of nodes, the lower the performance


Frame/Packet SizeMost LANs are designed to support only a

specific, fixed size frame or packet If message is larger than the frame size, it

must be broken into smaller sizes Increased in number of frames per

message would add to delay


Access protocolsMost influential metrice.g CSMA/CD, Token ring

User Traffic profileTime of useType of message generated by user

(Single, Broadcast)Number of users on line


Buffer SizePiece of memory used to receive, store,

process and forward messages If buffer is too small, delays or discarding

of packets may occur


Data Collision and RetransmissionCollision is inevitableFactors to be considered

Time it takes to detect collisionTransmission time of collided messages


Resource usageHow much resource is used by the user,

applicationHow much reserves are left

Processing DelaysCan be caused by both host and networkHost delays - divided into system and

application processing delays


Processing Delays (Con’t)Network - hardware and software cause

(Network card vs network driver)

ThroughputMeasurement of transmission capacityStatistical measurement over time


AvailabilityService availability from an end-user’s

point of view If delays are long, then even if the network

is available, where the end user is concern, the network is virtually unavailable


Fairness of measured data Important to take measurements at peak to

average ratio levelsCollect data at known high usage and

average usage periods

Sample measurementMeasurement of traffic volumeEnsure sampling interval is the same as

the above

Performance Management Measurement Methods

Collect data on current utilisation of network devices and linksStatic vs dynamicOnce off or continuos samplingEvent reporting or polling

Analyse the relevant dataSet utilisation thresholdsSimulate the network


Good sample size collected Do not just use one measurement Do several and take average

Ensure samples are representative Do measurements at different times of the

day/week Compare load (e.g lunch time load vs end of

month)


Beware of the unexpectedUnusual use on day of testBackups at 3 am

Threshold and Exception Reporting

Define indicators

Determine frequency of measurements

Define threshold for each indicator

Get guidelines from vendors

Threshold and Exception Reporting

Design reporting systemsDetermine information areas and indicators

What equipment, networks or objects are monitored

Determine distribution matrixWho gets reportsHow often and at what level of detailPresentation

Network Performance Analysis

Data AnalysisWhat are the effects of hardware/ software

on the network?Dependent on

Network type/protocolsPacket sizeBuffer sizeProcesses runningRouting algorithms

Network Performance Tuning

Tune to service requirements

Calculate payback in advance

Observe the 80-20 rule and 1:4 internet traffic rule

Focus on critical resources

Determine when capacity is exhausted

Define objectives

Determine time frames

System Design for Better Performance

CPU speed is more important than network speed No effects on bottlenecks

Reduce packet count to reduce software overheads Each packet has its associated overheads The more the number of packets, the more the

overheads Increase packets size to reduce packet overheads

System Design for Better Performance

Minimise context switching e.g kernel to user mode Waste processing time and power Reduced by having library procedures that send

data to do internal buffering until a substantial amount has been collected before processing.

Minimise copying Copying e.g from buffer to kernal to network layer

buffer to transport layer buffer Copy procedures should be minimised if not

required

Event Correlation Techniques

• Basic elements• Detection and filtering of events• Correlation of observed events using AI• Localize the source of the problem• Identify the cause of the problem

• Techniques• Rule-based reasoning• Model-based reasoning• Case-based reasoning• Codebook correlation model• State transition graph model• Finite state machine model

Rule-Based ReasoningWorking Memory

Inference Engine

Modifyattributes

of dataelements

Removedata

elements

Createnew dataelements

Knowledge Level

Selectbestrule

Invokeaction

Matchpotential

rules

ActRecognize

Data Level

Control Level

Knowledge Level

Figure 13.7 Basic Rule-Based Reasoning Paradigm

Rule-Based Reasoning• Rule-based paradigm is an iterative process• RBR is “brittle” if no precedence exists• An exponential growth in knowledge base poses problem in scalability• Problem with instability if packet loss < 10% alarm green if packet loss => 10% < 15% alarm yellow if packet loss => 15% alarm red Solution using fuzzy logic

Configuration for RBR ExampleBackboneRouter A

Router B

Hub C

Server D2 Server D3 Server D4Server D1

Alarm B

Alarm C

Alarm A

Alarms Dx

Figure 13.8 RBR-Based Correlation Example Scenario

RBR Example

The correlation rule can be specified as follows:Rule 0: Alarm A : Send rootcause alarm ARule 1 Alarm B If Alarm A present Related to A and ignoreRule 2 Alarm C If Alarm B present Related to B and ignoreRule 3 Alarm Dx if Alarm C present Related to C and ignore

Correlation window: 20 seconds.

Correlation window = 20 seconds

Arrival of Alarm A | Alarm A sentArrival of Alarm B |

(Correlated by rule 1)Arrival of Alarm C |

(Correlated by rule 2)Arrival of Alarms Dx |

(correlated by rule 3)End of correlation window |

Model-Based Reasoning

BackboneNetwork

Router

Hub1 Hub2 Hub3

NMS / Correlator

RouterModel

Hub2Model

Hub3Model

Hub1Model

Physical Network Equivalent Model

Figure 13.11 Model-Based Reasoning Event Correlator

• Object-oriented model• Model is a representation of the component it models• Model has attributes and relations to other models• Relationship between objects reflected in a similar relationship between models

MBR Event CorrelatorExample:

Recognized by Hub 1 model

Hub 1 model queries router model

Hub 1 fails

Router modeldeclares failure

Hub 1 modeldeclares NO failure

Router modeldeclares nofailure

Hub 1 modeldeclares Failure

Case-Based Reasoning

Input Retrieve Adapt Process

CaseLibrary

Figure 13.12 General CBR Architecture

• Unit of knowledge• RBR rule• CBR case

• CBR based on the case experienced before; extend to the current situation by adaptation• Three adaptation schemes

• Parameterized adaptation• Abstraction / re-specialization adaptation• Critic-based adaptation

CBR: Matching Trouble Ticket

Example: File transfer throughput problem

Trouble: file_transfer_throughput=FAdditional data: noneResolution: A=f(F), adjust_network_load=AResolution status: good

Figure 13.13 Matching Trouble Ticket

CBR: Parameterized Adaptation

Trouble: file_transfer_throughput=F'Additional data: noneResolution: A'=f(F'), adjust_network_load=A'Resolution status: good

Figure 13.14 Parameterized Adaptation

• A = f(F)• A’ = f(F’)• Functional relationship f(x) remains the same

CBR: Abstraction / Re-specialization

• Two possible resolutions• A = f(F) Adjust network load level• B = g(F) Adjust bandwidth

• Resolution based on constraint imposed

CBR: Critic-Based Adaptation

Trouble: file_transfer_throughput=FAdditional data: network_load=NResolution: A=f(F,N), adjust_network_load=AResolution status: good

Figure 13.16 Critic-Based Adaptation

• Human expertise introduces a new case• N (network load) is an additional parameter added to the functional relationship

CBR-Based Critter

Fault Management

Fault Resolution

Spectrum

ConfigurationManagement

FaultDetection

Network

Input Retrieve Adapt Process

CaseLibrary

User

DeterminatorsApplicationTechniques

User-basedAdaptation

Propose

Figure 13.17 CRITTER Architecture

CRITTER

Codebook Correlation Model:Generic Architecture

• Yemini, et.al. proposed this model • Monitors capture alarm events• Configuration model contains the configuration of the network• Event model represents events and their causal relationships• Correlator correlates alarm events with event model and determines the problem that caused the events

Network Monitors

EventModel

ConfigurationModel

Correlator Problems

Codebook Approach

• Correlation algorithms based upon coding approach to even correlation• Problem events viewed as messages generated by a system and encoded in sets of alarms• Correlator decodes the problem messages to identify the problems

Approach:

Two phases: 1. Codebook selection phase: Problems to be monitored identified and the symptoms they generate are associated with the problem. This generates codebook (problem-symptom matrix)2. Correlator compares alarm events with codebook and identifies the problem.

Causality Graph

E1

E5E4

E2 E3

E6 E7

Figure 13.19 Causality Graph

• Each node is an event• An event may cause other events • Directed edges start at a causing event and terminate at a resulting event• Picture causing events as problems and resulting events as symptoms

Labeled Causality Graph

P1

S2S1

P2 P3

S3 S4

Figure 13.20 Labeled Causality Graph for Figure 13.19

• Ps are problems and Ss are symptoms• P1 causes S1 and S2• Note directed edge from S1 to S2 removed; S2 is caused directly or indirectly (via S1) by P1• S2 could also be caused by either P2 or P3

Codebook

P1 P2 P3S1 1 1 0S2 1 1 1S3 0 1 1S4 0 0 1

• Codebook is problem-symptom matrix• It is derived from causality graph after removing directed edges of propagation of symptoms• Number of symptoms => number of problems• 2 rows are adequate to identify uniquely 3 problems

Correlation Matrix

P1 P2 P3S1 1 1 0S3 0 1 1

• Correlation matrix is reduced codebook

State Transition Model

ping node

receive response

pingresponse

Figure 13.27 State Transition Diagram for Ping / Response

• Used in Seagate’s NerveCenter correlation system• Integrated in NMS, such as OpenView• Used to determine the status of a node

State Transition Model Example

BackboneNetwork

Router

Hub1 Hub2 Hub3

NMS / Correlator

Physical Network

• NMS pings hubs every minute• Failure indicated by the absence of a response

State Transition Graphping hub

receive response

pingresponse

pinged twice(Ground state)

No response

pinged 3 times

No response

receive responsefrom router

Request

Response

ping router

No response

No responsefrom Router,

No action

Response receivedfrom Router

Action: Send Alarm

Figure 13.28 State Transition Graph Example

Finite State Machine Model

Send Request

Receive Response Send Response

Receive Request

Client Server

RequestResponse

RequestMessage

ResponseMessage

CommunicationChannel

ReceiveSend

Figure 13.29 Communicating Finite State Machine

• Finite state machine model is a passive system; state transition graph model is an active system• An observer agent is present in each node and reports abnormalities, such as a Web agent• A central system correlates events reported by the agents• Failure is detected by a node entering an illegal state

network management

Documents

network service response

performance evaluationverify

network highway

network managementlecture

perception of network

userapplication response

overall network services

networkhost delays