learning nagios

GnuGroup InternationalGnuGroup International

NagiosNagiosIT infrastructure monitoring toolIT infrastructure monitoring tool

ILGILGInsight GNU/Linux GroupInsight GNU/Linux GroupReinventing the way you,Reinventing the way you,

Think,Think,Learn,Learn,WorkWork

www.gnugroup.org 2

NAGIOS

MODULE - 2

www.gnugroup.org 3

Index of module - 2

● External Commands● Event Handlers● Volatile Services● Freshness Checks● State Stalking● Flapping● Using Templates● Oject Inheritance● Passive Check /NSCA

● Clustering - Distributed Monitoring

● Redundant and Failover Network Monitoring

●

●

●

●

●

●

●

● NagiosQL● Security Considerations

www.gnugroup.org 4

External Commands

● Nagios can process commands from external applications (including the CGIs) and alter various aspects of its monitoring functions based on the commands it receives.

● External applications can submit commands by writing to the command file, which is periodically processed by the Nagios daemon.

www.gnugroup.org 5

External Commands Enabling External Commands

In order to have Nagios process external commands, make sure you do the following:

– Enable external command checking with the check_external_commands option.

– Set the frequency of command checks with the command_check_interval option.

– Specify the location of the command file with the command_file option.

– Setup proper permissions on the directory containing the external command file, as described in the quickstart guide.

When Does Nagios Check For External Commands?

– At regular intervals specified by the command_check_interval option in the main configuration file

– Immediately after event handlers are executed. This is in addtion to the regular cycle of external command checks and is done to provide immediate action if an event handler submits commands to Nagios.

Using External Commands

– External commands can be used to accomplish a variety of things while Nagios is running.

– Example of what can be done include temporarily disabling notifications for services and hosts, temporarily disabling service checks, forcing immediate service checks, adding comments to hosts and services, etc.

●

www.gnugroup.org 6

Event Handlers Event handlers are optional system commands (scripts or executables) that are run whenever a host or service state change occurs.

● An obvious use for event handlers is the ability for Nagios to proactively fix problems before anyone is notified. Some other uses for event handlers include:

● Restarting a failed service● Entering a trouble ticket into a helpdesk system● Logging event information to a database● Etc

When Are Event Handlers Executed?

● Event handlers are executed when a service or host:● Is in a SOFT problem state● Initially goes into a HARD problem state● Initially recovers from a SOFT or HARD problem state

.

●

www.gnugroup.org 7

Event Handlers● Event Handler Types

There are different types of optional event handlers that you can define to handle host and state changes:

● Global host event handler● Global service event handler● Host-specific event handlers● Service-specific event handlers

● Enabling Event Handlers

Event handlers can be enabled or disabled on a program-wide basis by using the enable_event_handlers

in your main configuration file.

Host- and service-specific event handlers can be enabled or disabled by using the event_handler_enabled

directive in your host and service definitions.

Host- and service-specific event handlers will not beexecuted if the global enable_event_handlers option is disabled.

●

www.gnugroup.org 8

Event HandlersExample of Event Handlers

Host file directive

define service{

use local-service

host_name localhost

service_description daemons

check_command check_nrpe!check_daemons

event_handler restart-services

}

In your commands.cfg file, make sure you have event_handler defined something like:

define command{

command_name restart-services

command_line /usr/local/nagios/libexec/eventhandlers/restart-services \

$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$

}

The problem we have is that the event_handler runs as the Nagios user, which tyipcally will not be able to restart a service.

Edit the SUDOERS file (visudo) and add something like the lines below to the end of the file.

User_Alias NAGIOS = nagios,nagcmd

Cmnd_Alias NAGIOSCOMMANDS = /sbin/service

Defaults:NAGIOS !requiretty

NAGIOS ALL=(ALL) NOPASSWD: NAGIOSCOMMANDS

●

●

●

●

●

●

●

●

●

www.gnugroup.org 9

Volatile ServicesNagios has the ability to distinguish between "normal" services and "volatile" services.

● The is_volatile option in each service definition allows you to specify whether a specific service is volatile or not.

● For most people, the majority of all monitored services will be non-volatile (i.e. "normal"). However, volatile services can be very useful when used properly...

● What Are They Useful For?

Volatile services are useful for monitoring...

– Things that automatically reset themselves to an "OK" state each time they are checked

– Events such as security alerts which require attention every time there is a problem (and not just the first time)

●

www.gnugroup.org 10

Freshness Check Nagios supports a feature that does "freshness" checking on the results of host and service checks.

The purpose of freshness checking is to ensure that host and service checks are being provided passively by external applications on a regular basis.

Freshness checking is useful when you want to ensure that passive checks are being received as frequently as you want.

This can be very useful in distributed and failover monitoring environments.

●

www.gnugroup.org 11

Freshness Check How Does Freshness Checking Work?

● Nagios periodically checks thefreshness of the results for all hosts services that have freshness checking enabled.

● A freshness threshold is calculated for each host or service.

● For each host/service, the age of its last check result is compared with the freshness threshold.

● If the age of the last check result is greater than the freshness threshold, the check result is considered "stale".

● If the check results is found to be stale, Nagios will force an active check of the host or service by executing the command specified by in the host or service definition.

●

●

www.gnugroup.org 12

Freshness Check Enabling Freshness Checking

● Enable freshness checking on a program-wide basis with the check_service_freshness and

check_host_freshness directives.

● Use service_freshness_check_interval and host_freshness_check_interval options to tell Nagios how

often in should check the freshness of service and host results.

● Enable freshness checking on a host- and service-specific basis by setting the check_freshness option

in your host and service definitions to a value of 1.

● Configure freshness thresholds by setting the freshness_threshold option in your host and service

definitions.

● Configure the check_command option in your host or service definitions to reflect a valid command

that should be used to actively check the host or service when it is detected as stale.

● The check_period option in your host and service definitions is used when Nagios determines when a

host or service can be checked for freshness, so make sure it is set to a valid timeperiod.

●

●

●

●

www.gnugroup.org 13

Freshness Check An example of a service that might require freshness checking might be one that reports the status of your nightly backup jobs.

● Perhaps you have a external script that submit the results of the backup job to Nagios once the backup is completed.

● In this case, all of the checks/results for the service are provided by an external application using passive checks.

● In order to ensure that the status of the backup job gets reported every day, you may want to enable freshness checking for the service.

● If the external script doesn’t submit the results of the backup job, you can have Nagios fake a critical result by doing something like this.

www.gnugroup.org 14

Freshness Check For example, the following service definition will accept passive checks but will report an error if they are not present:

define service

{

● Use generic-service

● host_name linuxbox02

● service_description SSH

● check_command no-passive-check-results

● check_freshness 1

● freshness_threshold 43200

● active_checks_enabled 1

● passive_checks_enabled 1

● }

The freshness_threshold option specifies the number of seconds after which an active check should be performed. In this case, it is set to 12 hours.

www.gnugroup.org 15

Freshness Check It is also necessary to define a command that will be run if no passive check results have been provided.

The following command will use the check_dummy plugin to report an error:

define command

{

command_name no-passive-check-results

command_line $USER1$/check_dummy 2 "No passive check

results"

}

www.gnugroup.org 16

Freshness Check Enabling Freshness Checking

● Enable freshness checking on a program-wide basis with the check_service_freshness and

check_host_freshness directives.

● Use service_freshness_check_interval and host_freshness_check_interval options to tell Nagios how

often in should check the freshness of service and host results.

● Enable freshness checking on a host- and service-specific basis by setting the check_freshness option

in your host and service definitions to a value of 1.

● Configure freshness thresholds by setting the freshness_threshold option in your host and service

definitions.

● Configure the check_command option in your host or service definitions to reflect a valid command

that should be used to actively check the host or service when it is detected as stale.

● The check_period option in your host and service definitions is used when Nagios determines when a

host or service can be checked for freshness, so make sure it is set to a valid timeperiod.

●

●

●

●

www.gnugroup.org 17

State Stalking● State "stalking" is a feature which is probably not going to used by

most users.

● When enabled, it allows you to log changes in the output service and host checks even if the state of the host or service does not change.

● When stalking is enabled for a particular host or service, Nagios will watch that host or service very carefully and log any changes it sees in the output of check results.

● As you’ll see, it can be very helpful to you in later analysis of the log files.

www.gnugroup.org 18

State Stalking● How Does It Work?

● Under normal circumstances, the result of a host or service check is only logged if the host or service has changed state since it was last checked.

● There are a few exceptions to this, but for the most part, that’s the rule.

● If you enable stalking for one or more states of a particular host or service, Nagios will log the results of the host or service check if the output from the check differs from the output from the previous check.

● Take the following example of eight consecutive checks of a service:

www.gnugroup.org 19

State Stalking● Take the following example of eight consecutive checks of a service:

www.gnugroup.org 20

State Stalking

● Why is this? With state stalking enabled, Nagios would have examined the output from each service check to see if it differed from the output of the previous check.

● If the output differed and the state of the service didn’t change between the two checks, the result of the newer service check would get logged.

● The decision to to enable state stalking for a particular host or service will also depend on the plugin that you use to check that host or service.

● If the plugin always returns the same text output for a particular state, there is no reason to enable stalking for that state.

● You can enable state stalking for hosts and services by using the stalking_options directive in host and service definitions.

● Volatile services are similar, but will cause notifications and event handlers to run. Stalking is purely for logging purposes.

www.gnugroup.org 21

Flapping● Flapping is a situation where a host or service changes states very

rapidly—constantly switching between working correctly and not working at all.

● This can happen for various reasons—a service might crash after a short period of operating correctly or due to performing some maintenance by system administrators.

● Nagios can detect that a host or service is flapping, if Nagios is configured to do so.

● It does so by analyzing previous results, in terms of how many state changes between have happened and within a specific period of time.

● Nagios keeps a history of the 21 most recent checks and analyzes changes within that history.

●

www.gnugroup.org 22

Using Templates● Templates in Nagios allow you to create a set of parameters that

can then be used in the definitions of multiple hosts, services, and contacts.

● The main purpose of templates is to keep parameters that are generic to all objects, or a group of objects, in one place.

● This way, you can avoid putting the same directives in hundreds of objects, and your configuration is more maintainable.

● It is also good to start using templates for hosts and services, and decide how they should be used.

www.gnugroup.org 23

Object Inheritance● Basics

● There are three variables affecting recursion and inheritance that are present in all object definitions..

The first variable is name. Its just a "template" name that can be referenced in other object definitions so they can inherit the objects properties/variables. Template names must be unique amongst objects of the same type,

● The second variable is use. This is where you specify the name of the template object that you want to inherit properties/variables from. The name you specify for this variable must be defined as anotherobject’s template named (using the name variable)

● The Third, Register, Defining templates in Nagios is very similar to defining actual objects. You simplydefine the template as the required object type. The only difference is that you needto specify the register directive and specify a value, of 0 for it. This will tell Nagiosthat it should not treat this as an actual object, but as a template.

● define someobjecttype{object-specific variables ...name template_nameuse name_of_template_to_useregister [0/1]}

●

www.gnugroup.org 24

Object Inheritance● Basics

● There are three variables affecting recursion and inheritance that are present in all object definitions..

The first variable is name. Its just a "template" name that can be referenced in other object definitions so they can inherit the objects properties/variables. Template names must be unique amongst objects of the same type,

● The second variable is use. This is where you specify the name of the template object that you want to inherit properties/variables from. The name you specify for this variable must be defined as anotherobject’s template named (using the name variable)

● The Third, Register, Defining templates in Nagios is very similar to defining actual objects. You simplydefine the template as the required object type. The only difference is that you needto specify the register directive and specify a value, of 0 for it. This will tell Nagiosthat it should not treat this as an actual object, but as a template.

● define someobjecttype{object-specific variables ...name template_nameuse name_of_template_to_useregister [0/1]}

●

www.gnugroup.org 25

Object Inheritance

www.gnugroup.org 26

Passive Check / NSCA● Another great feature that Nagios offers is the ability for third-party software or

other Nagios instances to report information on the status of services or hosts.

● This way, Nagios does not need to schedule and run checks by itself, but other applications can report information asit is available to them.

● This means that your applications can send problem reports directly to Nagios instead of just logging them

● Nagios also offers a tool for sending passive check results for hosts and services over a network. It is called NSCA (Nagios Service Check Acceptor).

● It can be used to send results from one Nagios instance to another.

● This mechanism includes password protection, along with encryption, to preventinjection of false results in to Nagios. In this way, NSCA communication sent over Internet is more secure.

www.gnugroup.org 27

Passive Check / NSCA● There are also different types of checks including external applications or devices

that want to report information directly to Nagios.

● This can be done to gather all critical errors to a single, central place. These types of checks are called Passive Checks.

For example, when a web application cannot connect to the database, it will let Nagios know about it immediately.

● It can also send reports after a database recovery, or periodically, even if connectivity to the database has been consistently available, so that Nagios has an up-to-date status.

● This can be done in addition to active checks,to identify critical problems earlier.

● Nagios also offers a way of combining the benefits of both active and passive checks

www.gnugroup.org 28

Passive Check / NSCA● The first thing that needs to be done in order to use passive checks for your Nagios

setup is to make sure that you have the following options in your main Nagios configuration file:

● accept_passive_service_checks=1● accept_passive_host_checks=1

It would also be good to enable the logging of incoming passive checks—

● This makes determining the problem of not processing a passive check much easier. The following directive allows it:

● log_passive_checks=1

www.gnugroup.org 29

Passive Check / NSCA● Setting up hosts or services for passive

checking requires an object to be defined and set up so as not to perform active checks

● define host

● {

● Use generic-host

● host_name linbox1

● Address 10.1.1.45



● }

●

● Configuring services is exactly the same as with hosts

define service

● {

● Use ping-template

● host_name linbox1

● service_description PING



● }

●

● In this case, Nagios will never perform any active checks on its own and will only rely on the results that are passed to it.

● We can also configure Nagios so that if no new information has been provided within a certain period of time, it will use active checks to get the current status of the host or service by setting the active_checks_enabled option to 1

●

●

●

www.gnugroup.org 30

NSCA● NSCA is an application that allows the sending of results directly to the

Nagiosexternal command pipe.

●

● NSCA consists of two parts—the server and the client.

● The part responsible for receiving check results and passing them to Nagios is the server.

● This listens on a specific TCP port for NSCA clients passing information.

● It accepts and authenticates incoming connections and passes these results to the Nagios external command pipe.

● All information is encrypted using the MCrypt library

www.gnugroup.org 31

NSCA

www.gnugroup.org 32

NSCA

www.gnugroup.org 33

NSCA

www.gnugroup.org 34

NSCA● NSCA is an application that allows the sending of results directly to the

Nagiosexternal command pipe.

●

● NSCA consists of two parts—the server and the client.

● The part responsible for receiving check results and passing them to Nagios is the server.

● This listens on a specific TCP port for NSCA clients passing information.

● It accepts and authenticates incoming connections and passes these results to the Nagios external command pipe.

● All information is encrypted using the MCrypt library

www.gnugroup.org 35

Clustering

● One of the first bottlenecks organizations will run into is performance when monitoring a large number of hosts and services.

● This can occur even earlier if you are using performance handlers on your service or host checks.

● One way to resolve performance problems is to cluster Nagios;

● clustering is also very useful when there are a number of remote sites that need to be monitored by Nagios

● Usually, there are one or more Nagios instances that report information to a single central Nagios instance.

● The servers that reports information to another Nagios machine as a slave.

● A Nagios instance that receives reports from oneor more slaves will be referred to as a master.

.

●

36

One Nagios Instance

www.gnugroup.org 37

Clustering

● One of the first bottlenecks organizations will run into is performance when monitoring a large number of hosts and services.

● This can occur even earlier if you are using performance handlers on your service or host checks.

● One way to resolve performance problems is to cluster Nagios;

● clustering is also very useful when there are a number of remote sites that need to be monitored by Nagios

● Usually, there are one or more Nagios instances that report information to a single central Nagios instance.

● The servers that reports information to another Nagios machine as a slave.

● A Nagios instance that receives reports from oneor more slaves will be referred to as a master.

.

●

38

Many Nagios Instances

www.gnugroup.org 39

Clustering

www.gnugroup.org 40

Clustering

Data Flow

www.gnugroup.org 41

Clustering

www.gnugroup.org 42

Clustering● remote site Configuration

● install Nagios as normal on the server and then change the following parameters in nagios.cfg to allow it to function properly in our Nagios cluster:

● enable_notifications = 0 ; # We do not want this instance sending out

● notifications.

● obsess_over_services=1 ; # We want the remote server to obsess over

services so all changes will be reported back to the master server.

● oscp_command=nsca_send_result ; # This is a custom script shown next

With these configuration changes in place, the remote Nagios server will call the command nsca_send_result after every service check executed on the remote host.

● The nsca_send_result script will then forward the service check results to the master Nagios server.

●

●

www.gnugroup.org 43

Clustering● The nsca_send_result script will then forward the service check results to the

master Nagios server. Place the following definition for nsca_send_result in your commands

● configuration file (commands.cfg by default):

define command{

command_name nsca_send_result

command_line /usr/local/nagios/libexec/nsca_send_result

$HOSTNAME$ ‘$SERVICEDESC$’ $SERVICESTATE$ ‘$SERVICEOUTPUT$’

}

●

www.gnugroup.org 44

NagiosQL

● NagiosQL is a powerful web-based GUI tool that helps you configure and manage your Nagios network monitor.

● NagiosQL is a web-based GUI tool that you can use for the administration work.

●

●

●

● NagiosQL’s features include these capabilities:

● Build complex configurations

● Manage and use all of your configurations

● Create, delete, modify, and copy settings

● Create and export configuration files

● Create and download configuration files

● Easy configuration import

● Auto backup configuration files

● Consistency checks

● Syntax verification

● User management

● Instant activation of new configurations

● MySQL database platform

●

www.gnugroup.org 45

NagiosQLNagiosQL’s installation requirements

– Web server (Apache 2.x or greater preferred)

– MySQL 5.x or greater

– Nagios 2.x/3.x (local or remote)

– PHP 5.2.0 or greater including:

– PHP Module: Session

– PHP Module: MySQL

– PHP Module: gettext

– PHP Module: filter

– PHP Module: FTP (optional)

– PECL Extension: SSH (optional)

– Javascript activated in Web browser

–

–

–

–

–

–

–

www.gnugroup.org 46

NagiosQL● Extract the downloaded file

● Open a terminal.

● Change to the document root with the command cd /var/www/html.

● Unpack the newly downloaded tar file with the command sudo tar xvzf nagiosql_XXX.tar.gz (XXX is the release number).

● Rename the newly created nagiosql32 directory to nagiosql with the command sudo mv nagiosql32 nagiosql.

–

www.gnugroup.org 47

NagiosQL● Change the permissions of the necessary folders

● You must run the following commands in order to give NagiosQL the proper permission to install and run. (Note: This assumes your web server runs under the www-data user name; if it doesn’t, alter the commands to suit your setup.)

●

● Nagios main configuration files

●

● sudo chgrp www-data /etc/nagios

● sudo chgrp www-data /etc/nagios/nagios.cfg

● sudo chgrp www-data /etc/nagios/cgi.cfg

● sudo chmod 775 /etc/nagios

● sudo chmod 664 /etc/nagios/nagios.cfg

● sudo chmod 664 /etc/nagios/cgi.cfg

●

–

www.gnugroup.org 48


● NagiosQL configuration

●

● sudo chmod 6755 /etc/nagiosql

● sudo chown www-data.nagios /etc/nagiosql

● sudo chmod 6755 /etc/nagiosql/hosts

● sudo chown www-data.nagios /etc/nagiosql/hosts

● sudo chmod 6755 /etc/nagiosql/services

● sudo chown www-data.nagios /etc/nagiosql/services

●

–

www.gnugroup.org 49

NagiosQL● NagiosQL backup configuration

●

● sudo chmod 6755 /etc/nagiosql/backup

● sudo chown www-data.nagios /etc/nagiosql/backup

● sudo chmod 6755 /etc/nagiosql/backup/hosts

● sudo chown www-data.nagios /etc/nagiosql/backup/hosts

● sudo chmod 6755 /etc/nagiosql/backup/services

● sudo chown www-data.nagios /etc/nagiosql/backup/services

●

● Amend already existing files

●

● sudo chmod 644 /etc/nagiosql/*.cfg

● sudo chown www-data.nagios /etc/nagiosql/*.cfg

● sudo chmod 644 /etc/nagiosql/hosts/*.cfg

● sudo chown www-data.nagios /etc/nagiosql/hosts/*.cfg

● sudo chmod 644 /etc/nagiosql/services/*.cfg

● sudo chown www-data.nagios /etc/nagiosql/services/*.cfg

●

● The Nagios binary must be executable by the Apache user

sudo chown nagios.www-data /usr/sbin/nagios

● sudo chmod 750 /usr/sbin/nagios-

●

●

www.gnugroup.org 50


● NagiosQL configuration

●

● sudo chmod 6755 /etc/nagiosql

● sudo chown www-data.nagios /etc/nagiosql

● sudo chmod 6755 /etc/nagiosql/hosts

● sudo chown www-data.nagios /etc/nagiosql/hosts

● sudo chmod 6755 /etc/nagiosql/services

● sudo chown www-data.nagios /etc/nagiosql/services

●

–

www.gnugroup.org 51

NagiosQLConfiguration

Directory Structure

/etc/nagiosql/ -> Common configuration files

" /hosts -> Host configuration files

" /services -> Service configuration files

" /backup/ -> Backups of the common configuration files

" " /hosts -> Backups of the host configuration files

" " /services -> Backups of the service configuration files

–

www.gnugroup.org 52

NagiosQL● : Begin the web install

● You should be able to fire up your browser and point it to http://ADDRESS_TO_SERVER/nagiosql/install/ (ADDRESS_TO_SERVER is the actual address of the server hosting NagiosQL), where you can begin the web-based installation.

●

●

–

www.gnugroup.org 53

NagiosQLNagiosQL will make sure everything passes muster for the installation. If anything fails, this screen will give you plenty of information about the problem.

–

www.gnugroup.org 54

NagiosQLConfigure a database

The installer creates a database for you.

–

www.gnugroup.org 55

NagiosQL Log in. You log in to your NagiosQL site by pointing your browser to

http://ADDRESS_TO_SERVER/nagiosql/

–

www.gnugroup.org 56

NagiosQL Admin screen

–

www.gnugroup.org 57

Security ConsiderationsBest Practices

● Use a Dedicated Monitoring Box

● Don’t Run Nagios As Root

Nagios doesn’t need to run as root, so don’t do it. You can tell Nagiosto drop privileges after startup and run as another user/group by using the nagios_user and nagios_group directives in the main config file. If you need to execute event handlers or plugins which require root access, you might want to try using sudo.

● Lock Down The Check Result Directory

Make sure that only the nagios user is able to read/write in the check result path. If users other than nagios (or root) are able to write to this directory, they could send fake host/service check results to the Nagios daemon.

● Lock Down The External Command File.

If you enable external commands, make sure you setproper permissions on the /usr/local/nagios/var/rw directory. You only want the Nagios user (usuallynagios) and the web server user (usually nobody, httpd, apache2, or www-data) to have permissions to write to the command file.

● Require Authentication In The CGIs.

● Use Full Paths In Command Definitions.

● Hide Sensitive Information With $USERn$ Macros

● secure Communication Channels

● .and many more.....................

●

–

www.gnugroup.org 58

Thanks

&

Questions / Answers??????