Download - Fact-Based Monitoring - PuppetConf 2014
![Page 1: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/1.jpg)
Fact-based Monitoringpuppetconf 2014
Alexis Lê-Quôc @alq
![Page 2: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/2.jpg)
Alexis Lê-Quôc, @alqCTO at Datadog
![Page 3: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/3.jpg)
Poll: Monitoring makes me…
happy proud
cry want to hide
![Page 4: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/4.jpg)
Puppet brings Automation to Systems Management
![Page 5: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/5.jpg)
Improve Monitoring
the way Puppet has improved
Systems Management
![Page 6: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/6.jpg)
“The good old days”
• Your “CMDB” was Excel
• SSH in and hack away
• Little time for anything else
![Page 7: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/7.jpg)
Then Puppet came…
• Expressive rules that capture expected result
• Using facts and classifiers, a.k.a. metadata to figure out where to apply changes
• That freed up a lot of our time*
* on a per-machine basis
![Page 8: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/8.jpg)
–Me (just now)
“Puppet brings immunity of configuration to change in infrastructure”
![Page 9: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/9.jpg)
I have seen this before…
![Page 10: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/10.jpg)
–C.J. Date (1977)
“[SQL brings] immunity of application to change in storage structure and access strategy”
http://www.cs.berkeley.edu/~brewer/cs262/SystemR.pdf
![Page 11: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/11.jpg)
SQL
• 1974 IBM introduces System R and its Structured Query Language
• Expressive rules that capture expected result
• Using facts and predicates, a.k.a. metadata to figure out what data to get
• That freed up a lot of development time
![Page 12: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/12.jpg)
SQL
• From a time-consuming, imperative mess (“how”)
• … to expressive data queries (“what”)
SQL query
SELECT (desired facts) FROM (existing facts) WHERE (matching criteria)
![Page 13: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/13.jpg)
Puppet
• From a time-consuming, imperative mess (“how”)
• … to expressive configuration queries (“what”)
puppet apply
CHANGE (desired facts) FROM (existing puppet facts) WHERE (matching puppet classes)
![Page 14: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/14.jpg)
Is there a pattern?
![Page 15: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/15.jpg)
–MCollective overview
“Break free from ever more complex naming conventions for hostnames as a means of identity. Use a very rich set of meta
data provided by each machine to address them.”
![Page 16: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/16.jpg)
MCollective
• From a time-consuming, imperative mess (“how”)
• … to expressive orchestration queries (“what”)
mco rpc service restart service=nginx\ -F webpool=A
EXEC (desired actions) FROM (existing puppet facts) WHERE (matching puppet classes)
![Page 17: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/17.jpg)
Back to monitoring
• Monitoring is to behavior what Puppet is to configuration
• Monitoring is to behavior what MCollective is to orchestration
![Page 18: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/18.jpg)
Monitoring
• From a time-consuming, imperative mess (“how”)
• … to expressive monitoring queries (“what”)
Monitoring query
MONITOR (desired behavior) FROM (existing heartbeats/metrics) WHERE (matching puppet facts)
![Page 19: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/19.jpg)
Examples• “All provisioned web servers in the production environment,
datacenter ABC must respond to queries within 200ms”
• “All PostgreSQL servers must have a postgres: bgwriter process running”
• “At least one ActiveMQ server is up to support mcollective"
• Never mention a hostname
![Page 20: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/20.jpg)
Hosts are not the center of the monitoring universe.
Facts are!
Hosts are just places where facts occur.
![Page 21: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/21.jpg)
The proof is in the pudding…
![Page 22: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/22.jpg)
Hosts at the center of the universea.k.a. the Wrong Way
![Page 23: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/23.jpg)
–Nagios Core 4 manual on monitoring clusters
“Its fairly straightforward, so hopefully you find things easy to understand…”
![Page 24: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/24.jpg)
Host-centric: Monitor a DNS cluster
check_commandcheck_service_cluster!"DNS Cluster"!0!1!$SERVICESTATEID:host1:DNS Service$,$SERVICESTATEID:host2:DNS Service$,$SERVICESTATEID:host3:DNS Service$
Where do host1, host2, host3 come from?
![Page 25: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/25.jpg)
Host-centric: can’t use facts directly• “Host groups solve this problem”. No, they don’t.
• Combinatorial explosion, e.g. trivially
• 4 data centers (us-1, us-2, eu, apac)
• 5 classes (web, db, cache, appserver, hadoop)
• 3 environments (test, staging, prod)
• => up to 119 materialized host groups
![Page 26: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/26.jpg)
Nagios-bashing?
• No!
• Same fatal flaw with all host-centric monitoring tools
• Host-centric monitoring forces an extra, expensive step:
• replicate fact-based conditionals in host-centric templates
![Page 27: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/27.jpg)
–puppet-nagios author
“Please note that this module is not for the faint of heart. Even I (the author) have my head hurt each time I have to make
modifications to it…”
![Page 28: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/28.jpg)
Facts at the center of the universea.k.a. the Right Way
"De Revolutionibus manuscript p9b" by Nicolas Copernicus - www.bj.uj.edu.pl. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:De_Revolutionibus_manuscript_p9b.jpg#mediaviewer/File:De_Revolutionibus_manuscript_p9b.jpga
![Page 29: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/29.jpg)
Earlier Examples
• “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms”
• “All PostgreSQL servers must have a postgres: bgwriter process running”
• “At least one ActiveMQ server is up to support mcollective"
![Page 30: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/30.jpg)
In Sensu (heartbeats)• “All PostgreSQL servers must have a postgres: bgwriter process
running”
class postgres::monitoring::sensu { sensu::subscription { 'postgres': }}
• Monitoring using a fact-based query
• Is node of class “postgres” and subscribed to “postgres” or not?
• If so, it will execute the postgres check
![Page 31: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/31.jpg)
In Datadog (metrics)• “All provisioned web servers in the production environment,
datacenter ABC must respond to queries within 200ms”$ puppet module install datadog-datadog_agent
class { ‘datadog_agent’:
api_key => …,tags => [$environment],fact_to_tags => [“datacenter”]
}include datadog_agent::integrations::nginx
![Page 32: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/32.jpg)
In Datadog (metrics)• Monitoring using a fact-based query
• Puppet facts directly reused
max(nginx.request.latency{production,datacenter:ABC}) < 200
![Page 33: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/33.jpg)
What to take away
![Page 34: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/34.jpg)
Fact-based monitoring
1. Hosts are not at the center of the monitoring universe
2. Expressive monitoring uses queries
3. Monitoring queries should use Puppet facts
![Page 35: Fact-Based Monitoring - PuppetConf 2014](https://reader033.vdocuments.us/reader033/viewer/2022051514/549c66a3ac7959b52a8b46e7/html5/thumbnails/35.jpg)
Thank you!