nagios core 4 - netways in nagios core query: “@ \0” ... commands are always handled ... andreas...

36
Nagios Core 4 News and improvements

Upload: trinhquynh

Post on 25-May-2018

246 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios Core 4 News and improvements

Page 2: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

About me

32 years old

Programming since I was seven

Work as “core architect” at op5

Nagios Core co-maintainer since 2009

Will be found at the bar in the evenings

Page 3: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

About op5

http://www.op5.com

Page 4: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios Core 4

Goals

Algorithm analysis crash course

Bottleneck analysis of Nagios Core 3

Bottleneck solutions in Nagios Core 4

New features

Future possibilities

Page 5: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios Core 4 goals

Stability

low complexity

testing

Scalability

efficient, reusable, well-tested algorithms

efficient resource usage

Simplicity

useful api's

no “magic” and no bloat

Page 6: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Algorithm analysis – big Oh

n = 100, one operation = 1 microsecond

O(1) = 0.0000001 second

O(lg n) = 0.0000046 seconds

O(n) = 0.00001 second

O(n * lg n) = 0.00046 seconds

O(n^2) = 0.01 second

O(2^n) = 4*10^16 years

O(n!) = 2.96*10^144 years

Conclusion:

Good algorithms > beefy hardware

Page 7: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Algorithm analysis – big Oh

1 2 3 4 5 6 7 8 9 10 0

20

40

60

80

100

120

O(lg n)

O(n)

O(n^2)

Page 8: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

I/O media comparison

HDD seektime: 5.11ms

SSD seektime: 0.24ms

RAM seektime: 0.000013ms (13ns)

SSD is 21.3 times faster than SCSI

RAM is 393077 times faster than SCSI

RAM is 18461 times faster than SSD

Conclusion

All types of disk access is bad

Page 9: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Bottleneck analysis - Test setup

3000 hosts

200 000 services

5 minute check interval

really (really) stupid plugin: check_aok

Page 10: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios 3 bottlenecks

configuration parsing

event queue insertion

add_event() runs in O(n) time 676 times per second, but lowest bound is O(lg n)

macro resolution

strcmp() ~3700 times/sec to handle checks

job spawning and check reaping

heavy on cache-line fills and disk I/O

insufficient parallelization

Page 11: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios 3 check flowchart

Nagios fork()s a child

child writes half a checkresult file

child fork()s and runs shell

child completes checkresult file

Nagios reads spooldir

“ok-to-read”? child reads status and output

shell fork()s and runs plugin

child creates an “ok-to-read” file

Nagios finds a checkresult file

shell parses commandline

child exits Nagios parses checkresult

cache miss

remove result and “ok-to-read”

Nagios reads scheduling queue

read the file

Page 12: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios 3 check flowchart - hotspots

Nagios fork()s a child

child writes half a check-result file

child fork()s and runs shell

child completes checkresult file

Nagios reads spooldir

“ok-to-read”? child reads status and output

shell fork()s and runs plugin

child creates an “ok-to-read” file

Nagios finds a checkresult file

shell parses commandline

child exits

read the file cache miss

remove result and “ok-to-read”

Nagios reads scheduling queue

Nagios parses checkresult

Page 13: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

depth-first search for host and service dependencies

O(n^2) -> O(n): 400000000 -> 20000 operations for 20000 dependencies

group members no longer duplicated

Verify exactly once

Effect: Nagios loads configurations really fast

Config parsing solution

Page 14: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Move to priority queue on binary heap

Insertion: O(n) -> O(lg n)

Extract: O(1) -> O(lg n)

43000000 -> 9460 operations per second

Effect: Main nagios process uses (a lot) less CPU

Kudos: libpqueue author Volkan Yazici

Event queue solution

Page 15: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Macro names sorted on startup

Lookups: O(n) -> O(lg n)

65360 -> 3010 operations per second

Effect: Main nagios process uses less CPU

Todo: Cache resolved check commands (when configured to)

Macros solution

Page 16: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Checks Solutions

Worker processes run all helper apps (checks, notification, eventhandlers)

fork()'s/sec increased (800 with 300MB process, 13900 with 1MB process)

Effects:

Drastically reduced I/O load (100% -> 1%)

Drastically reduced CPU usage

Up to ~300000 checks / 5 minutes

Kudos: Sven Nierlein, William Leibzon & Jean Gabès

Page 17: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Worker processes breakdown

Workers are spawned by Nagios

Chosen in round-robin fashion

Workers communicate with Nagios using libnagios api's exclusively

Todo:

Special-purpose workers calling in

Zero fork()'s

Experimental implementation in op5 labs

Remote workers

Page 18: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios 4 check flowchart - hotspots

Nagios tells worker to run check

worker parses commandline

plugin runs

shell fork()s

worker reads status and output

“Simple” commandline?

worker sends data back to Nagios Nagios parses check result

Nagios reads scheduling queue

worker fork()s

worker receives command

Page 19: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

With special-purpose workers

Nagios tells worker to run check worker parses commandline

Voodoo

worker sends data back to Nagios Nagios parses check result

Nagios reads scheduling queue worker receives command

Page 20: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios 3 check flowchart - hotspots

Nagios fork()s a child

child writes half a check-result file

child fork()s and runs shell

child completes checkresult file

Nagios reads spooldir

“ok-to-read”? child reads status and output

shell fork()s and runs plugin

child creates an “ok-to-read” file

Nagios finds a checkresult file

shell parses commandline

child exits

read the file cache miss

remove result and “ok-to-read”

Nagios reads scheduling queue

Nagios parses checkresult

Page 21: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Check engine performance comparison

0

50000

100000

150000

200000

250000

300000

350000

Centreon

Icinga

Nagios 3

gearman

Shinken

Nagios 4

Page 22: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Nagios 4 – New features

Major:

libnagios

Query handler

NERD

Minor:

service parents

hourly_value + minimum_value

$CHECKSOURCE$

Page 23: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

libnagios

iobroker – multiplexing library

iocache - bulk reading and writing

kvvec – key value vector handling

dkhash – dual-key hash api

bitmap – set-operations for large sets

squeue – fast scheduling queue

pqueue – priority queue (from Apache)

skiplist – previously in Nagios core only

nsock – simple socket library

Page 24: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

libnagios – usage example

#include <nagios/lib/libnagios.h> #define QH_SOCKET_PATH "/opt/monitor/var/nagios.qh" int main(int argc, char **argv) { int sd, r; char *buf[4096]; sd = nsock_unix(QH_SOCKET_PATH, NSOCK_TCP | NSOCK_CONNECT, 0); if(sd < 0) { printf("Failed to connect to '%s': %s: %m\n", argv[1], nsock_strerror exit(1); } if (nsock_printf("@nerd subscribe opathchecks") > 0) { while((r = read(sd, buf, sizeof(buf))) > 0) write(fileno(stdout), buf, r); } close(sd); return 0; }

Page 25: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Query handler

General purpose handler for addressable queries in Nagios Core

query: “@<address><SP><query>\0”

“echo” service built in

query_socket=/path/to/nagios.qh in nagios.cfg

Kudos for inspiration: Mathias Kettner

Page 26: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

NERD

Nagios Event Radio Dispatcher

Provides real-time data to outside addons

Can reduce I/O load of current addons

Queried as 'nerd' via query-handler

Example queries:

@nerd subscribe hostchecks

@nerd subscribe servicechecks

Todo: Macro support, 'alerts' channel

demo time :)

Page 27: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Other features

Service parents

servicedependencies made easy

hourly_value + minimum_value

$CHECKSOURCE$

Useful when adding remote checking modules

“make dox” and look in Documentation/html

Page 28: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Easter eggs / micro-features

The /dev/null hack

object_cache_file

status_file

nagios-devel package available

libnagios and Nagios Core headers

Page 29: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Addon status

Works:

mod_gearman

modpnp

livestatus (from http://github.com/ageric/livestatus)

merlin

Page 30: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Known bugs, issues and ToDo's

Host latency calculation is messed up

If use_aggressive_host_checks=1, on-demand host checks are still run synchronously

Environment macros are currently not supported

Page 31: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Deprecation notices

Command line

-o (don't verify objects) is removed and will throw an error

-x (don't verify object paths) is deprecated and will produce a warning

Page 32: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Deprecation notices, continued

Object configuration in nagios.cfg is now officially unsupported. Do not rely on it to work

Embedded perl has been removed

Too many reports on memory leaks

Performance improved in workers by removing it, due to smaller memory footprint

Page 33: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Deprecation notices, continued

nagios.cfg:

sleep_time - we now poll until it's time to run the next event

command_check_interval – commands are always handled immediately

last_command_check – as per above

failure_prediction* - never implemented

Everything relating to embedded perl

Page 34: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Deprecation notices, continued

objects

failure_prediction* - this was never implemented

group member exclusions no longer inherited by group-in-group inclusion

group1->members = A,B

group1->group_members = group2

group2->group_members = !B,C

group1 has A,B,C as members in Nagios 4

group1 had A,C as members in Nagios 3

Page 35: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Special thanks

Ethan Galstad

Daniel Wittenberg

Armin Wolfermann

Joerg Linge

Sven Nierlein

Mark Frost

Robin Sonefors

William Leibzon

Everyone who sent me configs for testing

Page 36: Nagios Core 4 - NETWAYS in Nagios Core query: “@ \0” ... commands are always handled ... Andreas Ericsson Created Date:

Questions?

Look me up between sessions

Check out the 'make dox' thingie

Online resources

http://www.github.com/ageric

http://www.op5.com