lee myers - what to do when nagios notification don't meet your needs
TRANSCRIPT
![Page 1: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/1.jpg)
What to do when Nagios notification don't meet your needs?
You Push It
![Page 2: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/2.jpg)
Background
Career Start
Intel - ASCII RED Supercomputer
• 1st TeraFlops Supercomputer• Cabinets 102 - Drive & Compute clusters• 4,536 Nodes• 9,216 Processors (Pentium Pro’s)• 9,216 Cores• 1600 Square Feet
Currently
NCAR - Yellowstone Computer
• 2012: 13th with 1.5 PetaFlops, Now 50th• 94 Cabinets - 74 Compute & 10 Drive clusters• 4,542 Nodes• 9,036 Processors (Intel Xeon E5-2670)• 72,288 Cores
• 2,000 Square Feet
![Page 3: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/3.jpg)
Nagios Configuration
Primary Instance• Hosts - 1289• Services - 3235
Total Instances• Hosts - 1410• Services - 3867
Test Instance• Hosts - 20,007• Services - 40,045• Passive Results from scripts
Primary Instance• 4 Check_MK Monitored Servers• 5 Remote Servers sending Passive
Results• 4 Sites being Monitored
Normal Load < 1 with 5 instances running.
Load with Test running < 4
Using OMD 1.2 (Nagios 3.5, Check_MK 1.2.4p5, Thruk 1.84-6, PNP4Nagios 0.6.24)
![Page 4: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/4.jpg)
Nagios Notification Configuration
Host / Service
• notification_period– 24x7– workhours
• contact_groups
Contact
• service_notification_period– 24x7– workhours
• host_notification_period– 24x7– workhours
• service_notification_options– w,u,c,r,f
• host_notification_options– d,u,r
![Page 5: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/5.jpg)
Standard Work Week
Simple distinction between work and home.
![Page 6: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/6.jpg)
Non-Standard Rotating Work Week
Complex and Every Week is Different.
![Page 7: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/7.jpg)
Since we have 24x7 coverage, why did we want notifications?
We are not always in our Operations Center at Night
• Doing nightly Visual Inspections• Replacing hardware in the Supercomputer• Working with facilities• Talking with Security• Eating a meal in our Kitchen• Watching fireworks with facilities• ...
![Page 8: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/8.jpg)
Our initial Failure
No Sound from iPad Web or Apps
![Page 9: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/9.jpg)
What We Needed
• Interface to Nagios Data• Something to Parse for
Unacknowledged Alerts• Something to send out Notifications• Program to give us our alerts on our
Mobile Devices
![Page 10: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/10.jpg)
Interface to Nagios Data
Check_MK Livestatus• Nagios Broker Module• Written by Mathias Kettner• Direct Connection to Nagios through a
UNIX Socket• No Database to administer• No Configuration needed• Single line needs to be added to
nagios.cfg• Access it from the shell with unixcat• Uses Livestatus Query Language• http://mathias-kettner.com/checkmk_livestatus.html
Example:root@linux# echo 'GET hosts' | unixcat /var/lib/nagios/rw/live
acknowledged;action_url;address;alias;check_command;check_period;checks_enabled;contacts;in_check_period;in_notification_period;is_flapping;last_check;last_state_change;name;notes;notes_url;notification_period;scheduled_downtime_depth;state;total_services
0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;Acht;check-mk-ping;;1;check_mk,hh;1;1;0;1256194120;1255301430;Acht;;;24X7;0;0;7
0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;DREI;check-mk-ping;;1;check_mk,hh;1;1;0;1256194120;1255301431;DREI;;;24X7;0;0;1
0;/nagios/pnp/index.php?host=$HOSTNAME$;127.0.0.1;Drei;check-mk-ping;;1;check_mk,hh;1;1;0;1256194120;1255301435;Drei;;;24X7;0;0;4
![Page 11: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/11.jpg)
Something to Parse - Livestatus
LQL Queries• “GET” and name of Table• Arbitrary number of header lines
consisting of a keyword, a colon and arguments.
• Empty line or ‘End of Transmission’
Tableshosts services hostgroupscontacts commands servicegroupslog timeperiods contactgroupsstatus downtimes hostsbygroupcolumns statehist commentsservicesbygroup servicesbyhostgroup
ColumnsColumns: <list of column names to return in order>
FiltersFilter: <column name> <operator> <value>
Operators: =, ~, =~, ~~, <, >, <=, >=, !=, !~, !=~, !~~Values: number, text
Combining filtersOr: <last x filters>And: <last X filters>Negate:
Others - Counting, Sums, Max, Min, Sd Dev, and more
![Page 12: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/12.jpg)
Send out Notifications
Pushbullet• Free• Several API’s
– Android Extensions– iPhone– HTTP API
• https://docs.pushbullet.com
Were interested in the HTTP API, we are not writing a custom mobile app.
HTTP API Calls• Objects
– /v2/pushes– /v2/devices– /v2/contacts– /v2/users/me
• Accounts– /oath2
And more API calls which we don’t use.
![Page 13: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/13.jpg)
Deliver to our Mobile Devices
![Page 14: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/14.jpg)
Our Solution
![Page 15: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/15.jpg)
nagios_push.sh
#!/bin/bash
# Get the person's access code for pushbulletread AccessCode < /home/$USER/PushBulletAccessCode
# Query nagios for host alerts and send them to pushbulletfor i in $(/opt/omd/versions/1.00/bin/unixcat < /usr/local/sbin/PushBullet_query_hosts /omd/sites/noc/tmp/run/live | tr ' ' '_' | cut -f1,2 -d';'); do
curl -u $AccessCode: https://api.pushbullet.com/v2/pushes -d type=note -d title="${i%;*}" -d body="${i#*;}" > /dev/null 2>&1done
# Query nagios for service alerts and send them to pushbullet
for i in $(/opt/omd/versions/1.00/bin/unixcat < /usr/local/sbin/PushBullet_query_services /omd/sites/noc/tmp/run/live | tr ' ' '_' | cut -f1,2 -d';'); do
curl -u $AccessCode: https://api.pushbullet.com/v2/pushes -d type=note -d title="${i%;*}" -d body="${i#*;}" > /dev/null 2>&1done
![Page 16: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/16.jpg)
/usr/local/sbin/PushBullet_query_hosts
GET hostsColumns: name plugin_output stateFilter: state > 0Filter: acknowledged = 0Filter: host_scheduled_downtime_depth = 0
![Page 17: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/17.jpg)
PushBullet Command Files
/usr/local/sbin/PushBullet_query_hosts
GET hostsColumns: name plugin_output stateFilter: state > 0Filter: acknowledged = 0Filter: host_scheduled_downtime_depth = 0
/usr/local/sbin/PushBullet_query_services
GET servicesColumns: name plugin_output stateFilter: state > 0Filter: acknowledged = 0Filter: scheduled_downtime_depth = 0
![Page 18: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/18.jpg)
Our Support Scripts
![Page 19: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/19.jpg)
npush_on
#!/bin/bash#Make sure it is not run as rootif [ $UID -eq 0 ]then
echo "Not to be run as root."exit
fi
if (crontab -l|grep -q nagios_push.sh)then#UnComment out the crontab
crontab -l | sed -e 's/#*\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/'|crontabelse#Append the item to the crontab
(crontab -l; echo "*/4 * * * * /usr/local/sbin/nagios_push.sh")|crontabfi
#Let the user know when you are turning off the npushhour=$(date +%H)if [ "$hour" -lt 18 -a "$hour" -ge 6 ]; then
/usr/bin/at -f /usr/local/bin/npush_off 7pmecho "Turning off npush at 7 PM"
else/usr/bin/at -f /usr/local/bin/npush_off 7amecho "Turning off npush at 7 AM"
fi
![Page 20: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/20.jpg)
npush_off
#!/bin/bash#Comment out the crontab
crontab -l | sed -e 's/\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/#\*\/4 \* \* \* \* \/usr\/local\/sbin\/nagios_push.sh/'|crontab
![Page 21: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/21.jpg)
Future Upgrades
• Read Google Calendar for our schedule, no more remembering to turn it on.
• Send email alerts to PushBullet. (Without false alerts)• Remove the Crontab line, instead of commenting it out.• Anything else we can think of.
![Page 22: Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs](https://reader034.vdocuments.us/reader034/viewer/2022042907/58d022ed1a28ab97708b60c1/html5/thumbnails/22.jpg)
Questions