have you been stalking your servers?
DESCRIPTION
A presentation for DrupalCon Prague 2013TRANSCRIPT
Have you been stalking your servers?
Have you been stalking your servers?
Marji CermakSysadmin & DevOps Engineer at Morpht
[email protected]@cermakm
The rule of 3 things
picture: http://www.flickr.com/photos/helenaperezgarcia/5692392667/
The rule of 3 things
1. What is monitoring and why do you want to monitor
2. Some monitoring tools available for you
3. It is easy to start with monitoring.
Part 1
What is monitoring and why do you want to monitor
photo: http://www.flickr.com/photos/tiagopadua/7903366470/
Monitoring
Monitoring is an intermittent (regular or irregular) series of observations in time, carried out to show the extent of compliance with a formulated standard or degree of deviation from an expected norm.
J. M. Hellawell (1991), modified by A. Brown (2000), http://jncc.defra.gov.uk/page-2268nature conservation area
Why you need to monitor
● to know about the bad news before your customers (or your boss)
Why you need to monitor
● to know about the bad news before your customers (or your boss)
● to scale up your server in advance
Why you need to monitor
● to know about the bad news before your customers (or your boss)
● to scale up your server in advance
● to tune up your app
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
The fun of the nines
Source: http://en.wikipedia.org/wiki/High_availability
Nines: http://en.wikipedia.org/wiki/List_of_unusual_units_of_measurement#Nines
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
Why you need to monitor (cont.)
● to prove your uptime of 99.999 :)
● to minimise downtime (expensive)
● to capture customer information
Why you need to monitor (cont.)
● to have data / metrics to diagnose
Diagnosing your collected data
watch out for:● trends
Diagnosing your collected data
watch out for:● trends● spikes
Diagnosing your collected data
watch out for:● trends● spikes● irregularities
Diagnosing your collected data
watch out for:● trends● spikes● irregularities● thresholds
Areas to monitor
● network
photo: http://www.flickr.com/photos/misja_klimov/2120956405/
Areas to monitor
● network● server
photo: http://www.flickr.com/photos/johnjack/3666997634/
Areas to monitor
● network● server● services
photo: http://www.flickr.com/photos/agustingodet/3691794089/
Areas to monitor
● network● server● services
photo: http://www.flickr.com/photos/agustingodet/3691792393/
Areas to monitor
● network● server● services● applications photo: http://www.flickr.com/photos/cheerfulstoic/942211994/
Areas to monitor
● network● server● services● applications● users
photo: http://www.flickr.com/photos/jimmysmith/99528596/
Drupal Areas to monitor?
● network● server● services● applications● users
Drupal Areas to monitor
● network● server ● services● applications● users
Drupal Areas to monitor
● network● server ● services● applications● users
Drupal Areas to monitor
● network● server● services
○ webserver○ database
● applications● users
Drupal Areas to monitor
● network● server● services
○ webserver○ database
● applications - your Drupal site(s)● users
Drupal Areas to monitor
● network● server● services
○ webserver○ database
● applications - your Drupal site(s)● users
Part 2
Some monitoring tools available for you
Meet Nagios, Munin and others
● Nagios● Munin● APC dashboard
● related Drupal modules
Nagios /ˈnɑːɡiːoʊs/
● system, network and infrastructure monitoring software application
● monitors and alerts● many plugins
Nagios /ˈnɑːɡiːoʊs/
Name and Pronunciation:● NetSaint -> "Nagios Ain't Gonna Insist On
Sainthood"● Agios' a transliteration of the Greek word
άγιος (saint)
Nagios /ˈnɑːɡiːoʊs/
● alerts by email/pager/IM...● alerts to different contacts● notification escalation● service / host dependencies● soft / hard states
Nagios /ˈnɑːɡiːoʊs/
Drupal and Nagios
Munin
● network/system monitoring application● outputs graphs through a web interface● many plugins
Munin
● master / node architecture● connects to all nodes at regular intervals ● it uses the RRDtool (round robin database
tool, handles time-series data)
Munin Example
Drupal and Munin
Drupal and Munin
● they complement each other● nagios normally alerts on one “service” ● munin can be used to correlate different
things
Nagios & Munin
APC - what is it?
The Alternative PHP Cache (APC) is a free and open opcode cache for PHP.
APC - what is it?
The Alternative PHP Cache (APC) is a free and open opcode cache for PHP.
Its goal is to provide a free, open, and robust framework for caching and optimising PHP intermediate code.
Inside your webserver (not a webcache)
Monitoring APCMemory Usage, Hit & Misses
Monitoring APCFragmentation
Monitoring APCmemory usage
Monitoring APCfiles in cache
Other monitoring tools
● Collectd● Graphite● Shinken● Sensu● NewRelic● Pingdom
Part 3
It is easy to start with monitoring.
How to install these tools?
Muninsudo apt-get install munin munin-node
Nagiossudo apt-get install nagios3
APC dashboardphp.apc script from php-apc package
How to configure these?
● It is a bit fiddly● There are many guides targeting beginners● You don’t want to do it again and again
puppet – a quick way to start
system for automating system administration tasks
puppet – a quick way to start
● a declarative language for expressing system configuration,
puppet – a quick way to start
● a declarative language for expressing system configuration,
● a client and server for distributing it
puppet – a quick way to start
● a declarative language for expressing system configuration,
● a client and server for distributing it
● and a library for realising the configuration.
puppet – a quick way to start
package { 'munin-node': ensure => installed }
service { 'munin-node':
enable => true,
ensure => running,
require => Package['munin-node'],
}
puppet – a quick way to start
1. clone the stalk-your-box repo
2. run puppet apply on the code
3. monitor!
A quick way to start
$ git clone git://github.com/morpht/stalk-your-box.git /tmp/stalk-your-box
Cloning into '/tmp/stalk-your-box'...remote: Counting objects: 23, done.remote: Compressing objects: 100% (19/19), done.remote: Total 23 (delta 1), reused 23 (delta 1)Receiving objects: 100% (23/23), 11.35 KiB, done.Resolving deltas: 100% (1/1), done.
A quick way to start
$ cd /tmp/stalk-your-box/$ sudo puppet apply --modulepath=modules manifest.pp
notice: /Stage[main]/Nagios::Server/Package[nagios3]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Nagios::Server/File[/etc/nagios3/htpasswd.users]/ensure: created
notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: Adding password for user nagiosadmin
notice: /Stage[main]/Nagios::Server/Exec[update-nagios-htpasswd]/returns: executed successfully
notice: /Stage[main]/Munin::Node/Package[libcache-cache-perl]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Munin::Node/Package[munin-node]/ensure: ensure changed 'purged' to 'present'
notice: /Stage[main]/Munin::Node/File[munin-node.conf]/content: content changed '{md5}e486786f866d7d7e025dea401c300e7b' to '{md5}dbf97a87a8da86ef68155815ecae3c1c'
notice: /Stage[main]/Munin::Server/Service[apache2]: Triggered 'refresh' from 1 events
notice: Finished catalog run in 44.26 seconds
What this gives you
What this gives you
What this gives you
Manifest.pp# Execute apt-get update before any package is installed:
exec { 'apt-update':
command => 'apt-get update',
# but don't execute it more than once a day:
unless => 'test $(find /var/cache/apt/pkgcache.bin -mtime 0 | wc -l ) -eq 1',
}
Exec['apt-update'] -> Package <| |>
# Include minimal apache2 installation. Munin server, nagios
# and APC dashboard depend on it.
include 'apache2'
Manifest.pp# Install munin node and munin server:
class { 'munin::node': }
class { 'munin::server':
htuser => 'munin', # Username for basic access auth.
htpass => 'Prague2013' # Password for basic access auth.
}
# Install nagios:
class { 'nagios::server':
contact_email => 'root@localhost', # Email to send alerts to.
htpass => 'Prague2013', # Password for the nagiosadmin username.
}
Manifest.pp
# Deploys APC dashboard - install php-apc package and
# deploy the apc.php script from it.
package { 'php-apc': ensure => installed }
exec { 'deploy-apc-dashboard':
path => '/bin:/usr/bin',
command => 'gzip -dc /usr/share/doc/php-apc/apc.php.gz > /var/www/apc.php',
notify => Service['apache2'],
unless => '[ -f /var/www/apc.php ]',
require => [ Package['php-apc'], Package['apache2'] ]
}
Summary
It is easy to start with monitoring.
The fun part - what’s wrong?
What’s wrong here?
The fun part - what’s wrong?
Questions
Here is the get started monitoring repo:https://github.com/morpht/stalk-your-box
Marji CermakSysadmin & DevOps Engineer at Morpht
[email protected]@cermakm
ResourcesRule of Three: en.wikipedia.org/wiki/Rule_of_three_(writing)Nagios: http://www.nagios.org/Munin: http://munin-monitoring.org/Nagios module: https://drupal.org/project/nagiosMunin module: https://drupal.org/project/muninMunin plugins (experimental): https://drupal.org/sandbox/murrayw/2084281Sensu: http://sensuapp.orgMySQLTuner: http://MySQLTuner.pl
THANK YOU!
WHAT DID YOU THINK?
Locate this session at the DrupalCon Prague website:http://prague2013.drupal.org/schedule
Click the “Take the survey” link