network management
DESCRIPTION
Network Management. Lecture 4. Performance Management. The practice of optimizing network service response time. It also entails managing the consistency and quality of individual and overall network services. - PowerPoint PPT PresentationTRANSCRIPT
Network Management
Lecture 4
Performance ManagementThe practice of optimizing network service response time.
It also entails managing the consistency and quality of individual and overall network services.
The most important service is the need to measure the user/application response time.
For most users, response time is the critical performance success factor. This variable will shape the perception of network success by both your users and application administrators. (cisco)
What is Performance Management
Quantification of performance indicators on Server Network Workstation Applications
Standard performance goals are: Response time Utilization Throughput Capacity
In Performance Management
Need toMaintain continuous indicator for
performance evaluationVerify what levels of service need to be
maintained Identify actual and potential bottlenecksEstablish and report on usage trends
Objectives of Performance Management
Need to ensure that network highway remains accessible and not crowded
Provide a consistent level of service
Avoid degradation of performance
Provide proactiveproactive management
Performance Indicators Required
Transmission capacityExpressed in bits per second
Signal Propagation delayTime required to transmit signal to its
destinationLonger the propagation, the longer the delay
Performance Indicators Required
TopologyStar, Tree, Ring, BUS or Combination of
Star and RingWould limit number of workstations or
hosts per cable segment which can be attached to the network
The higher the number of nodes, the lower the performance
Performance Indicators Required
Frame/Packet SizeMost LANs are designed to support only a
specific, fixed size frame or packet If message is larger than the frame size, it
must be broken into smaller sizes Increased in number of frames per
message would add to delay
Performance Indicators Required
Access protocolsMost influential metrice.g CSMA/CD, Token ring
User Traffic profileTime of useType of message generated by user
(Single, Broadcast)Number of users on line
Performance Indicators Required
Buffer SizePiece of memory used to receive, store,
process and forward messages If buffer is too small, delays or discarding
of packets may occur
Performance Indicators Required
Data Collision and RetransmissionCollision is inevitableFactors to be considered
Time it takes to detect collisionTransmission time of collided messages
Performance Indicators Required
Resource usageHow much resource is used by the user,
applicationHow much reserves are left
Processing DelaysCan be caused by both host and networkHost delays - divided into system and
application processing delays
Performance Indicators Required
Processing Delays (Con’t)Network - hardware and software cause
(Network card vs network driver)
ThroughputMeasurement of transmission capacityStatistical measurement over time
Performance Indicators Required
AvailabilityService availability from an end-user’s
point of view If delays are long, then even if the network
is available, where the end user is concern, the network is virtually unavailable
Performance Indicators Required
Fairness of measured data Important to take measurements at peak to
average ratio levelsCollect data at known high usage and
average usage periods
Sample measurementMeasurement of traffic volumeEnsure sampling interval is the same as
the above
Performance Management Measurement Methods
Collect data on current utilisation of network devices and linksStatic vs dynamicOnce off or continuos samplingEvent reporting or polling
Analyse the relevant dataSet utilisation thresholdsSimulate the network
Performance Management Measurement Methods
Good sample size collected Do not just use one measurement Do several and take average
Ensure samples are representative Do measurements at different times of the
day/week Compare load (e.g lunch time load vs end of
month)
Performance Management Measurement Methods
Beware of the unexpectedUnusual use on day of testBackups at 3 am
Threshold and Exception Reporting
Define indicators
Determine frequency of measurements
Define threshold for each indicator
Get guidelines from vendors
Threshold and Exception Reporting
Design reporting systemsDetermine information areas and indicators
What equipment, networks or objects are monitored
Determine distribution matrixWho gets reportsHow often and at what level of detailPresentation
Network Performance Analysis
Data AnalysisWhat are the effects of hardware/ software
on the network?Dependent on
Network type/protocolsPacket sizeBuffer sizeProcesses runningRouting algorithms
Network Performance Tuning
Tune to service requirements
Calculate payback in advance
Observe the 80-20 rule and 1:4 internet traffic rule
Focus on critical resources
Determine when capacity is exhausted
Define objectives
Determine time frames
System Design for Better Performance
CPU speed is more important than network speed No effects on bottlenecks
Reduce packet count to reduce software overheads Each packet has its associated overheads The more the number of packets, the more the
overheads Increase packets size to reduce packet overheads
System Design for Better Performance
Minimise context switching e.g kernel to user mode Waste processing time and power Reduced by having library procedures that send
data to do internal buffering until a substantial amount has been collected before processing.
Minimise copying Copying e.g from buffer to kernal to network layer
buffer to transport layer buffer Copy procedures should be minimised if not
required
Event Correlation Techniques
• Basic elements• Detection and filtering of events• Correlation of observed events using AI• Localize the source of the problem• Identify the cause of the problem
• Techniques• Rule-based reasoning• Model-based reasoning• Case-based reasoning• Codebook correlation model• State transition graph model• Finite state machine model
Rule-Based ReasoningWorking Memory
Inference Engine
Modifyattributes
of dataelements
Removedata
elements
Createnew dataelements
Knowledge Level
Selectbestrule
Invokeaction
Matchpotential
rules
ActRecognize
Data Level
Control Level
Knowledge Level
Figure 13.7 Basic Rule-Based Reasoning Paradigm
Rule-Based Reasoning• Rule-based paradigm is an iterative process• RBR is “brittle” if no precedence exists• An exponential growth in knowledge base poses problem in scalability• Problem with instability if packet loss < 10% alarm green if packet loss => 10% < 15% alarm yellow if packet loss => 15% alarm red Solution using fuzzy logic
Configuration for RBR ExampleBackboneRouter A
Router B
Hub C
Server D2 Server D3 Server D4Server D1
Alarm B
Alarm C
Alarm A
Alarms Dx
Figure 13.8 RBR-Based Correlation Example Scenario
RBR Example
The correlation rule can be specified as follows:Rule 0: Alarm A : Send rootcause alarm ARule 1 Alarm B If Alarm A present Related to A and ignoreRule 2 Alarm C If Alarm B present Related to B and ignoreRule 3 Alarm Dx if Alarm C present Related to C and ignore
Correlation window: 20 seconds.
Correlation window = 20 seconds
Arrival of Alarm A | Alarm A sentArrival of Alarm B |
(Correlated by rule 1)Arrival of Alarm C |
(Correlated by rule 2)Arrival of Alarms Dx |
(correlated by rule 3)End of correlation window |
Model-Based Reasoning
BackboneNetwork
Router
Hub1 Hub2 Hub3
NMS / Correlator
RouterModel
Hub2Model
Hub3Model
Hub1Model
Physical Network Equivalent Model
Figure 13.11 Model-Based Reasoning Event Correlator
• Object-oriented model• Model is a representation of the component it models• Model has attributes and relations to other models• Relationship between objects reflected in a similar relationship between models
MBR Event CorrelatorExample:
Recognized by Hub 1 model
Hub 1 model queries router model
Hub 1 fails
Router modeldeclares failure
Hub 1 modeldeclares NO failure
Router modeldeclares nofailure
Hub 1 modeldeclares Failure
Case-Based Reasoning
Input Retrieve Adapt Process
CaseLibrary
Figure 13.12 General CBR Architecture
• Unit of knowledge• RBR rule• CBR case
• CBR based on the case experienced before; extend to the current situation by adaptation• Three adaptation schemes
• Parameterized adaptation• Abstraction / re-specialization adaptation• Critic-based adaptation
CBR: Matching Trouble Ticket
Example: File transfer throughput problem
Trouble: file_transfer_throughput=FAdditional data: noneResolution: A=f(F), adjust_network_load=AResolution status: good
Figure 13.13 Matching Trouble Ticket
CBR: Parameterized Adaptation
Trouble: file_transfer_throughput=F'Additional data: noneResolution: A'=f(F'), adjust_network_load=A'Resolution status: good
Figure 13.14 Parameterized Adaptation
• A = f(F)• A’ = f(F’)• Functional relationship f(x) remains the same
CBR: Abstraction / Re-specialization
• Two possible resolutions• A = f(F) Adjust network load level• B = g(F) Adjust bandwidth
• Resolution based on constraint imposed
CBR: Critic-Based Adaptation
Trouble: file_transfer_throughput=FAdditional data: network_load=NResolution: A=f(F,N), adjust_network_load=AResolution status: good
Figure 13.16 Critic-Based Adaptation
• Human expertise introduces a new case• N (network load) is an additional parameter added to the functional relationship
CBR-Based Critter
Fault Management
Fault Resolution
Spectrum
ConfigurationManagement
FaultDetection
Network
Input Retrieve Adapt Process
CaseLibrary
User
DeterminatorsApplicationTechniques
User-basedAdaptation
Propose
Figure 13.17 CRITTER Architecture
CRITTER
Codebook Correlation Model:Generic Architecture
• Yemini, et.al. proposed this model • Monitors capture alarm events• Configuration model contains the configuration of the network• Event model represents events and their causal relationships• Correlator correlates alarm events with event model and determines the problem that caused the events
Network Monitors
EventModel
ConfigurationModel
Correlator Problems
Codebook Approach
• Correlation algorithms based upon coding approach to even correlation• Problem events viewed as messages generated by a system and encoded in sets of alarms• Correlator decodes the problem messages to identify the problems
Approach:
Two phases: 1. Codebook selection phase: Problems to be monitored identified and the symptoms they generate are associated with the problem. This generates codebook (problem-symptom matrix)2. Correlator compares alarm events with codebook and identifies the problem.
Causality Graph
E1
E5E4
E2 E3
E6 E7
Figure 13.19 Causality Graph
• Each node is an event• An event may cause other events • Directed edges start at a causing event and terminate at a resulting event• Picture causing events as problems and resulting events as symptoms
Labeled Causality Graph
P1
S2S1
P2 P3
S3 S4
Figure 13.20 Labeled Causality Graph for Figure 13.19
• Ps are problems and Ss are symptoms• P1 causes S1 and S2• Note directed edge from S1 to S2 removed; S2 is caused directly or indirectly (via S1) by P1• S2 could also be caused by either P2 or P3
Codebook
P1 P2 P3S1 1 1 0S2 1 1 1S3 0 1 1S4 0 0 1
• Codebook is problem-symptom matrix• It is derived from causality graph after removing directed edges of propagation of symptoms• Number of symptoms => number of problems• 2 rows are adequate to identify uniquely 3 problems
Correlation Matrix
P1 P2 P3S1 1 1 0S3 0 1 1
• Correlation matrix is reduced codebook
State Transition Model
ping node
receive response
pingresponse
Figure 13.27 State Transition Diagram for Ping / Response
• Used in Seagate’s NerveCenter correlation system• Integrated in NMS, such as OpenView• Used to determine the status of a node
State Transition Model Example
BackboneNetwork
Router
Hub1 Hub2 Hub3
NMS / Correlator
Physical Network
• NMS pings hubs every minute• Failure indicated by the absence of a response
State Transition Graphping hub
receive response
pingresponse
pinged twice(Ground state)
No response
pinged 3 times
No response
receive responsefrom router
Request
Response
ping router
No response
No responsefrom Router,
No action
Response receivedfrom Router
Action: Send Alarm
Figure 13.28 State Transition Graph Example
Finite State Machine Model
Send Request
Receive Response Send Response
Receive Request
Client Server
RequestResponse
RequestMessage
ResponseMessage
CommunicationChannel
ReceiveSend
Figure 13.29 Communicating Finite State Machine
• Finite state machine model is a passive system; state transition graph model is an active system• An observer agent is present in each node and reports abnormalities, such as a Web agent• A central system correlates events reported by the agents• Failure is detected by a node entering an illegal state