psu coe network management
DESCRIPTION
TRANSCRIPT
Network ManagementIssues, Approach &
A Candidate Solution
William J. Burkhard
Associate Director
The Center for Electronic Design, Communications and Computing
James F. Carras
Network Coordinator
The Center for Electronic Design, Communications and Computing
Thomas G. Long
Senior Systems Analyst
The Center for Electronic Design, Communications and Computing
09 September 2002
College of EngineeringThe Pennsylvania State University
Network Management TopicsIssues
Network Management ElementsMonitoring Resources
Resource Condition Reporting
Monitoring Traffic Loads Benefits of Monitoring Traffic
Traffic Analysis & Filtering Importance of Traffic Analysis & Filtering
Resource Control Importance of Resource Control
Network Management Configuration
Operational Schema
Concept of Operations Flow Diagram
Concept of Operations Description
System Recommendation & Cost
Summary and Conclusion
Network ManagementIssues
By May 2002, the College’s network architecture migrated from a centrally manageable “Star” configuration to a completely distributed architecture.
The distributed architecture increased the College’s dependency on the backbone architecture of the University’s Information Technology Services managed systems.
A single point of failure exists in the new distributed network management schema.
Loss of network connectivity to the College’s server farm in Hammond Building results in reports that the all College network resources are in a failure mode.
To adequately manage the College’s new network architecture requires a philosophical change to the approach to accomplishing network management.
Network ManagementMonitoring Resources
NagiosHost & Services Monitoring
Runs Intermittent checks on hosts and services.
Sends notifications when problems are encountered.
Web Browser accessible hardware status, historical logs and reports.
Network Services Monitored SMTP, POP3, HTTP, NNTP, Ping…
Functionality Ability to define network host hierarchy. Ability to send contact notification of
problems via email and pager. Ability to define event handlers. Ability to apply on-the-fly command
interface modifications.
Network ManagementResource Condition Reporting
Equipment Operational StatusNagios provides the capability to establish control parameters and rules for monitoring and reporting on network closet router and Ethernet switch up or down time status.
When a system’s performance falls outside the performance parameters of rule sets, an alert is generated. The alert can take one form or multiple forms; the software alert can one or more individuals via email, pager, a cell phone message, instant message, SMS, etc.
The software’s configurability supports network host hierarchy for the detection of and distinction between hosts that are down and those that are unreachable.
The web interface permits the viewing of network element status and facilitates on-the-fly configurations.
A web accessible external command interface enables the application of system monitoring and notification behaviors; these behaviors can also be configured via third-party applications.
Network ManagementMonitoring Traffic Loads
Multi Router Traffic Grapher (MRTG)
Feature Monitors traffic loads on
Network links.
Functionality Provides live visual
representation of port data traffic loads.
Provides histories of port data traffic loads on daily, weekly, monthly and annual basis.
Provides a web interface for viewing port traffic loads.
Willard–2 to 211 HammondIn & Out
Network ManagementBenefits of Monitoring Traffic
Aside from a catastrophic network hardware failure reported by the resource monitoring application, Monitoring Traffic is a critical indication of when a real or potential problem may exist in the network.Monitoring data traffic rates down to the Ethernet switch level provides indications of when rates exceed the expected norm; this becomes the first indicator that a system is misbehaving, has been hacked and is broadcasting high outbound data rates over the network, or a subnet is under attack from outside the College’s network.Indications of high data rates allows the staff to efficiently, effectively and swiftly analyze conditions and respond to these unpredictable but certain events.As demands increase on the network’s bandwidth, traffic monitoring is also an activity that will indicate when available bandwidth needs to be increased to support faculty usage demands.
Network ManagementTraffic Analysis & Filtering
EtherealFeatures
Network Protocol Analysis. Filters for refining displayed
packet summary information. Maintaining saved copies of
network trace information.
Functionality Live network data capture. Editing of capture files. Multi protocol filtering of 289
protocols. GUI browsing of network data.
Network ManagementImportance of Traffic Analysis
Traffic analysis is a necessary element in the network management structure because it complements traffic load monitoring.Load monitoring provides the indication something is potentially wrong and traffic analysis provides the answer to “What” is going wrong.Traffic analysis monitors packet flow into and out of a network segment.Traffic analysis alerts the network management team when the packet analysis process identifies data traffic that my match known hacker data signatures.The configurability, monitoring flexibility and filtering capabilities all facilitate efficient threat analysis and identification.Knowing the nature and identity of an attack enables the network management team to react swiftly and appropriately to these events; the employment of router filters can block subsequent hacker attacks.Accurate information about an attack signature can be forwarded to the University Computer & Network Security Office for appropriate actions.
Network ManagementResource Control
iBootFeatures
Web Addressable & Configurable
Remote Manual Control Automatic Failure Detection Password Protected Output Power Reset Switches up to 12 Amps @ 115
Volts
Functionality Remote Reboot of any Device Automatic Reboot on Loss of
Ping Response
Network ManagementImportance of Resource Control
Resource control facilitates monitoring and power control to the most critical networking closet resource…the building router.Provides a mechanism to quickly recover from possible “hung” system conditions.In the automatic (ping) mode, the iBoot can detect the router’s failure to respond to pings; this state assumes that the router needs to be rebooted and the iBoot recycles device power.In the event of non restoration of router functionality after an auto reboot, the network management team can remotely access the iBoot to force another reboot attempt.The iBoot can also be used to remotely force a router reboot if the network management team determines the reboot may result in clearing an anomaly or other router problem.Prevents costly “down time” by possibly eliminating the need for a site visit by the network management team.
Network ManagementNetwork Management Configuration
iBoot
Alcatel OmniCoreOC-5022 Router
Internal Modem
POTS Line
Dell ManagementServer
Applications
Network ManagementOperating Schema
Dell Master Management
Server
Applications
Network OperationsControl Center
Room 151D Hammond
Status Loads Analysis
Web BasedDesktop & Remote
Monitoring
COENetwork
Connections
RemoteUser
Building 1
Building N
Network ManagementOperating Schema Flow Diagram
CustomerCall
ResourceMonitoring
Alert
ReviewNagiosStatus
HardwareResponseProblem
YesRepair
No Review TrafficAnalysis Data to
Find Offending Sys
COE orExternalProblems
ReviewMRTG
Data Rates
Find & RepairOr Block Sys
COE
Review TrafficAnalysis Data ToIdentify Attack
Signature & Source
Place FilterOn RouterOr ServerFirewalls
GatherForensic
Data
External
Notify PSUSecurity
Network ManagementConcept of Operation Description
Data communications problems usually manifest themselves in one of two ways:System Hardware Failures which can be either user or network based.Telephonic inquiries by Technical Contacts or users.
Appropriate immediate response is to determine if a network failure occurred in a building; this is accomplished by referring to the network status plots provided by Nagios.
Network system hardware failures are always followed by an almost immediate, visual and audible alerts from Nagios. Nagios also provides pager and email alerts to notify staff during non-working hours.Customer calls usually signify localized problems attributable to computer configuration issues, a computer-faceplate connection issue or other non–hard network system issue.
No immediate indication of a hardware problem can be an indicator that computer systems within or external to the College may be participating in hacker activities such as Denial of Service attacks. Responses to internal attacks are different from a response to external attacks; both types of responses rely on the analysis of data traffic.
A College system identified by the traffic analysis misbehaving on the network is removed or blocked from network access; once a system repair is verified, the systems is again given network access privileges. A system external to the College caught spamming or creating other problems for the network are filtered at a building router to prevent its access to and disruption of College computing.Forensic analysis of attacks are conveyed to the University Network and Computing Security Office for analysis and any appropriate follow-on actions.
The remaining element not shown on the flow diagram is the iBoot. This device is used to remotely cycle closet router AC power as a step in attempting to remotely clear an observed problem associated with this device.The main console in 151D Hammond provides continuous monitoring of all aspects of network management. Initial alerts and corrective actions can be taken locally on a desktop system; however, detailed troubleshooting, analysis and corrective actions are coordinated and accomplished in this facility.
Network ManagementSystem Recommendation & Cost
Item Cost / Closet Total Quantity
Cost / Month
Total 1st Year Cost
Recurring Annual Cost
Closet Servers $1,285 18 $23,130
1 Gig Memory Upgrade
$239 18 $4,302
Data Probe iBoot $275 15 $4,125
USB Cameras $29 17 $439
TNS POTs Lines $125 Install $18/ mo.
17 $306 $5,797 $3,672
Pagers OR
Cell Phones
$1,155
$105
3 $7.50
$90
$1,177.50
$1,185
$22.50
$1,080
Totals $3,126
$2,076
$313.50
$396
$38,970.50
$38,978
$3,694.50
$4,752
Software Applications are currently provided as freeware.
Network ManagementSummary & Conclusion
SummaryThe distribution of COE networking resources requires a change in network management philosophies an architecture.
Loss of network connectivity to TNS’ backbone from Hammond west results in a indication that the entire COE network is down.
A new COE network management architecture includes hardware and software solutions in all building closets were router connect to TNS’ backbone.
A minimal hardware architecture design and applications are presented.
Applications for critical network hardware status monitoring, data traffic and protocol/packet analysis are imperative elements in network management.
A concept for operational procedures ties together the concept for integrated COE network management.
Analysis and control is remotely accessible via web or Telnet sessions.
ConclusionThe Center is in need of a network management structure that provides accurate early notification of hardware and data traffic loads.
A minimal and cost effective network management architecture is proposed.
The proposed solution address all the basic needs for sound and qualitative network management.
The proposed network management architecture eliminates a single point of failure.
The proposed operational concept provided a formal structure for network management.
The proposed operational structure enable timely and accurate use of personnel resources to resolve problems.
Cell phones for three primary individuals are preferable to pagers; the initial cost is lower but recurring costs are higher. However, cell phones provide greater versatility.
Network Management
Addendum
Additional Reporting Details
16 September 2002
Network Management - NagiosResource Monitoring Reports
“Tactical” Overview of Resource AvailabilityStatus Reports - Instantaneous & Historical
Availability OverviewSummaryNetwork Grid AvailabilityNetwork Mapping3-D Mapping
Troubleshooting ReportsIdentification & Alerting to Service ProblemsIdentification & Alerting to Network OutagesNotification of Data Traffic Trends
OthersAvailability of each Monitored ElementHistory of Alerts Related to each Monitored ElementAnomaly NotificationsHost Resource Monitoring – Processor Loads, Disk & Memory Usage, Running Processes & Log FilesHierarchical Detection & Notification and Distinction of Services that are Down vs. UnreachableEscalation of Host and Services Notifications to Different Contact Groups.
Network Management - MRTGMonitoring Traffic Reports
“The Multi Router Traffic Grapher (MRTG) is a tool to monitor the traffic load on network-links. MRTG generates HTML pages containing graphical images which provide a LIVE visual representation of this traffic.”Instantaneous monitoring of port daily “in” and “out” data traffic loads viewable with web interface.Histogram reports of cumulative port “in” and “out” data traffic loads viewable with web interface: weekly, monthly and yearly.MRTG Logfile information is available for use in user developed analysis programs.
Network Management - EtherealTraffic Analysis & Filtering Reports
Ethereal Supports Data Capturing and Analysis of 289 Protocols.
When set up to Capture Packets of Selected Protocols, the Software Captures and Analyses Packet by Packet Content.
Captured Data is Retained in a Database for Periodic Review or Tracking Down Protocols that Contain Problem Packets.
Format and Content of Captured Data Depends on which of the 289 Protocols are being Analyzed by Ethereal.
Ethereal is a Continuously Running Process Whose Output can be Saved or Printed for Human Analysis of Detected Anomalies.
Protocols not of Interest can be Filtered to Prevent Capturing Excess Data that Would Cloud the Analysis Process.
Manual Data Analysis of Packets Tagged as Having Bogus Signatures Leads to Identification of Denial of Service or Hacker Systems’ Addresses for Router Filters & Reporting to the University Security Office.