nagios conference 2012 - mike weber - disaster

Post on 16-Apr-2017






Click to see full reader


10 Quick Steps To Disaster

Mike Weber

Inheriting Aberrations
with Objects

Where are those settings coming from?

Object Inheritance

Object Priorities

Object Chaining

Incomplete Objects

Canceling Inheritance

Additive Inheritance

Object Inheritance

Object Inheritance: Templates

Object Inheritance: No Hostgroups?

Object Inheritance: From Hostgroup

Object Inheritance: Info Option

Object Priorities: Local then Inheritance

Object Priorities: Order in List (Chaining)

Incomplete Object: Only Lists One Image

Canceling Inheritance: Object Contains Parents

Canceling Inheritance: Wrong Parents

Canceling Inheritance: Cancel Parents

Canceling Inheritance: Canceled Parents

Additive Inheritance: Append Object Contents

Additive Inheritance: Append Object Contents

Hoping BAD Things
Won't Happen

Real BAD Things Will Happen




XI: Automated Backup

0 7 * * * root /root/scripts/automysqlbackup

0 8 * * * root /root/scripts/autopostgresqlbackup


XI: Upgrade Backup


##### BackUp Of Nagios Before Upgrade #####

# Timestamp Backups TIMESTAMP=$(date +%Y%m%d_%H%M); echo $TIMESTAMP

service nagiosxi stop service npcd stop service ndo2db stop service nagios stop

mkdir /bk/upgrade_$TIMESTAMP tar cjf /bk/upgrade_$TIMESTAMP/nagios_$TIMESTAMP.tar.bz2 /usr/local/nagios tar cjf /bk/upgrade_$TIMESTAMP/nagiosxi_$TIMESTAMP.tar.bz2 /usr/local/nagiosxi pg_dump -U nagiosxi -c -F p nagiosxi | bzip2 -c > /bk/upgrade_$TIMESTAMP/pg_nagiosxi_$TIMESTAMP.sql.bz2 mysqldump -u root -pnagiosxi nagios | bzip2 -c > /bk/upgrade_$TIMESTAMP/my_nagios_$TIMESTAMP.sql.bz2 mysqldump -u root -pnagiosxi nagiosql | bzip2 -c > /bk/upgrade_$TIMESTAMP/my_nagiosql_$TIMESTAMP.sql.bz2

service nagios start service ndo2db start service npcd start service nagiosxi start

Core: Backup

#!/bin/sh# Timestamped Back Up

TIMESTAMP=`date +%Y%m%d_%H%M%S`;echo $TIMESTAMP

tar czvf /bk/nagios_dir_$TIMESTAMP.tar.gz /usr/local/nagiostar czvf /bk/pnp4nagios_dir_$TIMESTAMP.tar.gz /usr/local/pnp4nagios

System Warnings

Configuration Errors: Service Checks

Solution: Service Template Management

Service Template: Check Settings

Service Template: Alert Settings

Service Template: Add Hostgroup

Solution: Service Template Management

Max Concurrent Service Checks

Maximum Concurrent Checks

Edit nagios.cfg to avoid latency issues.


Mangling Users and Contacts

Managing Users and Contacts

Users (access to the web interface)

Contacts (notifications)

Creating Users: Web Interface

Creating Users: Web Interface

Creating Users: Restricted

Creating Users: Restricted

Managing Administrators: Full Access

Managing Administrators: Full Access

Core: cgi.cfg




Monitoring Non-Existent Ports
on Switches

Save Resources

Use AdminDown on Ports
* Administratively set unused ports as AdminDown
* Modify ifoperstatus

Turn Off Monitoring on Used Ports

Remove the Checks

Unused Switch Ports: Wasting Resources

* check port status
* check bandwidth
* send notifications
* ignore notifications

Modify check_ifoperstatus

Here is the code the affects output. You need to modify the line:

if ( not defined $adminWarn or $adminWarn eq "w" ) { $state = 'WARNING'; to$state = 'OK';

It is highlighted in the example.

## if ( not ($response->{$snmpIfAdminStatus} == 1) ) { $answer = "Interface $name (index $snmpkey) is administratively down."; if ( not defined $adminWarn or $adminWarn eq "w" ) { $state = 'OK'; } elsif ( $adminWarn eq "i" ) { $state = 'OK'; } elsif ( $adminWarn eq "c" ) { $state = 'CRITICAL'; } else { # If wrong value for -a, say warning $state = 'WARNING'; }

Administratively Down Ports

Disable Port Checks

* 790 port checks disabled
* 1.5 GB of RAM saved
* 18% reduction in max service check execution time

Encouraging Non-Accountability for Changes

Who Makes Changes on Your Nagios?

Limit Admin Access

Require Training

Create Policy for Changes

Use a Test Server

Audit Log

Abusing Nagios XI Wizards

Wizard or Manual Creation: Assessment

Which method provides the most efficient installation?
Example: Using a wizard for a switch is most efficient.
Example: Manually creating a service check to be used on 100 servers is most efficient.

Will it provide access to view the grouping of devices?
Example: Can effective reports be created from visible devices?

Does it make management easier in the long run?
Example: The use of templates is an efficient method to manage multiple devices that are similar.

Template Management

Disregarding Network Relationships


Host: Manage Parents

Host: Manage Parents

Network Relationships: Parents

Importing Infectious Diseases

GUI Infection: Lack of Command Line Skills

* cron jobs
* manual backups
* verification

* disk space
* logs

* finding stuff
* processes
* permissions

Edit Files
* learning vi or nano

Short Cut Infection: Auto-Discovery

Overestimating Human Intelligence

Some of the Things We Do as Humans Defies Logic

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level

Click to edit the title text format


Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

top related