it monitoring wg it/cs monitoring system

15
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ IT Monitoring WG IT/CS Monitoring System Virginie Longo September 14th 2011

Upload: kermit-wooten

Post on 01-Jan-2016

41 views

Category:

Documents


2 download

DESCRIPTION

IT Monitoring WG IT/CS Monitoring System. Virginie Longo. September 14th 2011. Summary. CS Monitoring Systems Spectrum CA Performance Analysis Others Tools Data storage Requirements NMS Status Requirements Researches. CS Monitoring systems. Spectrum CA. Description: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

IT Monitoring WG

IT/CS Monitoring System

Virginie Longo September 14th 2011

Page 2: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Summary

CS Monitoring Systems• Spectrum CA• Performance Analysis• Others Tools

Data storage Requirements

• NMS Status• Requirements• Researches

Page 3: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

CS Monitoring systems

Page 4: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Spectrum CA

Description:• Commercial Tool• Fault management oriented system • Root Cause Analysis/ alarm Correlation• Topology View• Service Manager => Relation With SLS View• Basic Performance manager

Volumes: • ~3000 devices monitored• Support 3K Laser devices for simple alarm (UP/DOWN)• Thousands of attributes polled and analyzed• 6GB of data events over 30 days

Monitoring Protocols:• SNMP and ICMP

Þ Information only feed by SNMP (No remote agent)• Few other support : DNS / DHCP / TRACEROUTE /NTP

/HTTP• Few home maid scripts for DHCP, web monitoring.

Page 5: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Alarm Monitoring

Spectrum Architecture (Storage system)

Spectrum DB

Models , topology, current polling value ,alarms

SNMP

SSLogger

Oracle

Stats(CSR)

Oracle

Alarm History(LANDB)

Alarm Notifier

Spectrum System Non Spectrum system

Mysql

Events

Remote Mysql

Service Manager

SLS

Devices Info

Page 6: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Performance Analysis

Statistics Architecture - Mix home maid system and Spectrum tool- Extraction data from Spectrum to Oracle DB- Data consolidation into RRD.- Displayed on Netstat website (PHP).

Volumes:- ~9000 models (port + devices) for 24K of RRDs- 36 Metrics- 157 Attributes- ~160K entries load into Oracle DB for 5MN of poll- Data kept 1 months for oracle- 2 years of consolidated data in RRDs.

Note : Metric is a group of attributes such as Bandwidth = in/out bits and in/out packets.

Page 7: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Performance Analysis

Page 8: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Other Tools

Syslog event recording- Gathering all log from network devices- Stored into Oracle DB- Accessible from CSDB- Filtering and propagation by notification

LHCOPN : Perfsonar Tool- Decentralized networks tool- OWD, latency and throughput regular test- Other tools like traceroute - LHCOPN network analysis

Implementation ongoing, testing phase with 1BG link, security tests not complete yet.(www.perfosnar.net)

Page 9: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Data storage

Page 10: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Data Storage

Summary:• Spectrum proprietary DBs for core and alarms • Mysql database for events and service manager• Oracle database for stats (CSR) and alarm

history (LANDB)• Oracle database for Syslog info• Standalone Mysql database for Perfsonar tools.

Þ Too many different type of storage.Þ Missing correlation between Syslog and SNMP

Page 11: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Requirements

Page 12: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

NMS Status

• Advantages :- Root cause analysis efficient- Correct Event- Alarm management- High availability - Really good topology views (useful for intervention group)- Support NICE users- Very good level of filtering (topology, alarms)

- Notification support

• Negative points / Weakness- Expensive- Polling limitation is almost reached

(new version with complete redraw of polling system will arrive in 2 years)- Not a performance system: can’t handle 50K of statistics- Integration of non certificated manufacturer is complex- Data collection mostly limited to SNMP (changes ongoing)

Page 13: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Requirements

Mandatory: Root Cause Analysis High polling system :1-2mn for critical nodes 3-5mn for others Network topology representation Notifications (SMS/ MAIL/XMPP) and general console Distributed environment High Availability System Complete performance management IPv6 Support

Nice to have : Autodiscovery system Mobile version Oracle centralized database

Numbers and storage time : Polling capacity for at least 5K nodes Performance statistics for 56K of ports Data lifetime: 1 month without aggregation, max with aggregation Devices Alarm: around 2 years

Page 14: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Researches

List of tools which fit better :• Icinga: Nagios like (forked) (Not Yet Tested) • Zabbix: Large polling scale, open source, notification, Oracle database,

distributed (NYT)(http://www.zabbix.com/features.php)

• Solarwind: commercial but include performance and less expensive (NYT)• Opennms :

Open source - Completely customizable High polling system with distributed environment Events correlation, Alarm management, notification Many data collection support (SNMP, HTML, JMX, JDBC, NAGIOS-NSCLIENT)

(http://www.opennms.org/about/)

Links :• http://en.wikipedia.org/wiki/Comparison_of_network_monitoring_systems• http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html

Page 15: IT Monitoring WG IT/CS Monitoring System

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Thanks Questions ?