lessons in nagios learnt from developing opsview · pdf filewhy? obligation – gplv2 ...
TRANSCRIPT
![Page 1: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/1.jpg)
Lessons in Nagios learnt from developing OpsviewTon Voon
Altinity Limited
September 2008
Copyright Ton Voon. Released under Creative Commons, Attribution-Noncommercial
![Page 2: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/2.jpg)
Classified: Top secrets of Nagios in OpsviewTon Voon
Altinity Limited
September 2008
Copyright Ton Voon. Released under Creative Commons, Attribution-Noncommercial
![Page 3: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/3.jpg)
“Opsview”?
Our monitoring solution
Database back end
Web front end
Configuration and status
Distributed
Open source
![Page 4: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/4.jpg)
Why?
Obligation – GPLv2
http://trac.opsview.org/browser/trunk/opsview-base
Moral duty
Business benefit:
Support moves to core projects
Easier for us to upgrade base code
Advertise ourselves as experts
![Page 5: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/5.jpg)
Distributed environments
![Page 6: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/6.jpg)
Batch uploading to master
Nagios documentation suggests calling ocsp command after every result
send_nsca
Batch up requests and send all at once
Implemented as service_perfdata_file_processing_command
nsca --single
![Page 7: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/7.jpg)
Aggregated writes
Problem: if cmd is not available, NSCA writes to a dump file. But if the cmd file comes back, doesn’t switch back
Affects Nagios reloads
Solution: when dump file is used, keeping checking for cmd file
In NSCA CVS HEAD, but not released yet
![Page 8: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/8.jpg)
Distributing CGI commands
Problem: Submitting a command via CGI on master should go down to slaves
Solution: broker module altinity_distributed_commands. For a selected list of external commands, writes to a cache file
![Page 9: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/9.jpg)
Some commands cannot be sent to slaves
DEL_HOST_SVC_DOWNTIME;hostname
Distributable API commands
![Page 10: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/10.jpg)
Freshness calculations
Arbitrary 15 seconds
We set to 30 minutes
In Nagios 3: additional_freshness_latency
Included libtap tests
![Page 11: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/11.jpg)
NRPE
![Page 12: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/12.jpg)
Centrally managed agents
Set allowed_hosts to blank
Commands defined to pass arguments from check_nrpe
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
![Page 13: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/13.jpg)
Backwards compatible
Increases to whatever is nrpe server is compiled with
Remote hostNagios server
check_nrpe nrpe
Remote hostNagios server
check_nrpe nrpe
Increased output
![Page 14: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/14.jpg)
Nagios
![Page 15: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/15.jpg)
Problem: Only want to show services specific to a user, not all services on the host
Solution: Removed authentication that allows host
Slicing services in CGI
![Page 16: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/16.jpg)
Solution: Removed authentication that allows host
Slicing services in CGI
![Page 17: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/17.jpg)
Initial states
Problem: Services and hosts go into a PENDING state. But this affects reporting and state changes because there has never been a result received. Also, there’s no entry in nagios_hoststatus/nagios_servicestatus
Solution: Create a broker module to send an UP/OK for all hosts/services
![Page 18: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/18.jpg)
Changing command based on timeperiod
Could do via Nagios API
But requires external process to submit
Can now do via configuration:
define service { ...check_timeperiod_command workhours,command_name}
![Page 19: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/19.jpg)
Notification logic performance tuning
Problem: 100% cpu!
Solution: strace on nagios showed lots of time spent in notification logic, calculating macros
![Page 20: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/20.jpg)
NDOutils
![Page 21: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/21.jpg)
Case insensitive object names
nagios_objects is the key table
But name1, name2 are case insensitive
HostnameA and hostnamea are the same host in NDO
But not in Nagios
ALTER TABLE nagios_objects MODIFY name1 varchar(128) COLLATE latin1_bin
![Page 22: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/22.jpg)
Indexing
Mysql can only use one index per table per query
Multi column indexes have important ordering
(instance_id, service_object_id, start_time, start_time_usec)
(start_time, instance_id, service_object_id, start_time_usec)
Use EXPLAIN to work out how Mysql will tackle your query
![Page 23: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/23.jpg)
Asynchronous imports
Problem:
broker modules are run synchronously
ndo2db also runs synchronously
Everything waits!
Nagios
ndomod
ndo2db Disk
Nagios
ndomod
import_ndologsd
ndo2db DBfile2sockDirectory
![Page 24: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/24.jpg)
Asynchronous imports, 2
File IPC
Larger blocksize for file2sock
Host failures also rotate
![Page 25: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/25.jpg)
Performance improvements
Strip unnecessary data being sent to NDO
Broker level
ndomod level
Helper tables - invoked at configdumpend
Multi valued inserts
Housekeeping external
![Page 26: Lessons in Nagios learnt from developing Opsview · PDF fileWhy? Obligation – GPLv2 Moral duty Business benefit: Support moves to core projects Easier for us to upgrade](https://reader033.vdocuments.us/reader033/viewer/2022051722/5aa4ff927f8b9ab4788c97d8/html5/thumbnails/26.jpg)
Summary