lessons in nagios learnt from developing opsview · pdf filewhy? obligation – gplv2 ...

Post on 11-Mar-2018

218 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lessons in Nagios learnt from developing OpsviewTon Voon

Altinity Limited

September 2008

Copyright Ton Voon. Released under Creative Commons, Attribution-Noncommercial

Classified: Top secrets of Nagios in OpsviewTon Voon

Altinity Limited

September 2008

Copyright Ton Voon. Released under Creative Commons, Attribution-Noncommercial

“Opsview”?

Our monitoring solution

Database back end

Web front end

Configuration and status

Distributed

Open source

Why?

Obligation – GPLv2

http://trac.opsview.org/browser/trunk/opsview-base

Moral duty

Business benefit:

Support moves to core projects

Easier for us to upgrade base code

Advertise ourselves as experts

Distributed environments

Batch uploading to master

Nagios documentation suggests calling ocsp command after every result

send_nsca

Batch up requests and send all at once

Implemented as service_perfdata_file_processing_command

nsca --single

Aggregated writes

Problem: if cmd is not available, NSCA writes to a dump file. But if the cmd file comes back, doesn’t switch back

Affects Nagios reloads

Solution: when dump file is used, keeping checking for cmd file

In NSCA CVS HEAD, but not released yet

Distributing CGI commands

Problem: Submitting a command via CGI on master should go down to slaves

Solution: broker module altinity_distributed_commands. For a selected list of external commands, writes to a cache file

Some commands cannot be sent to slaves

DEL_HOST_SVC_DOWNTIME;hostname

Distributable API commands

Freshness calculations

Arbitrary 15 seconds

We set to 30 minutes

In Nagios 3: additional_freshness_latency

Included libtap tests

NRPE

Centrally managed agents

Set allowed_hosts to blank

Commands defined to pass arguments from check_nrpe

command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$

Backwards compatible

Increases to whatever is nrpe server is compiled with

Remote hostNagios server

check_nrpe nrpe

Remote hostNagios server

check_nrpe nrpe

Increased output

Nagios

Problem: Only want to show services specific to a user, not all services on the host

Solution: Removed authentication that allows host

Slicing services in CGI

Solution: Removed authentication that allows host

Slicing services in CGI

Initial states

Problem: Services and hosts go into a PENDING state. But this affects reporting and state changes because there has never been a result received. Also, there’s no entry in nagios_hoststatus/nagios_servicestatus

Solution: Create a broker module to send an UP/OK for all hosts/services

Changing command based on timeperiod

Could do via Nagios API

But requires external process to submit

Can now do via configuration:

define service { ...check_timeperiod_command workhours,command_name}

Notification logic performance tuning

Problem: 100% cpu!

Solution: strace on nagios showed lots of time spent in notification logic, calculating macros

NDOutils

Case insensitive object names

nagios_objects is the key table

But name1, name2 are case insensitive

HostnameA and hostnamea are the same host in NDO

But not in Nagios

ALTER TABLE nagios_objects MODIFY name1 varchar(128) COLLATE latin1_bin

Indexing

Mysql can only use one index per table per query

Multi column indexes have important ordering

(instance_id, service_object_id, start_time, start_time_usec)

(start_time, instance_id, service_object_id, start_time_usec)

Use EXPLAIN to work out how Mysql will tackle your query

Asynchronous imports

Problem:

broker modules are run synchronously

ndo2db also runs synchronously

Everything waits!

Nagios

ndomod

ndo2db Disk

Nagios

ndomod

import_ndologsd

ndo2db DBfile2sockDirectory

Asynchronous imports, 2

File IPC

Larger blocksize for file2sock

Host failures also rotate

Performance improvements

Strip unnecessary data being sent to NDO

Broker level

ndomod level

Helper tables - invoked at configdumpend

Multi valued inserts

Housekeeping external

Summary

top related