metrics with ganglia
DESCRIPTION
Talk about using Ganglia and other tools for storing all kinds of web application metrics for both operations and business purposes. Presented at Cambridge Geek NightTRANSCRIPT
gareth rushgrove | morethanseven.net
Collecting MetricsWith Ganglia and Friends
Cambridge Geek Night 28th March 2011
http://www.flickr.com/photos/memestate/45986749
Gareth Rushgrove
gareth rushgrove | morethanseven.net
Work at FreeAgent
gareth rushgrove | morethanseven.net
freeagentcentral.com
Blog at morethanseven.net
gareth rushgrove | morethanseven.net
Curate devopsweekly.com
gareth rushgrove | morethanseven.net
Covering (Business Version)
gareth rushgrove | morethanseven.net
- Capacity planning metrics
- Metrics for your application- Business analytics
- Having everything in one place
Covering (Tech Version)
gareth rushgrove | morethanseven.net
- Ganglia Store metrics and view graphs
- Logster Get log files into Ganglia
- Gmetric Get anything into Ganglia
- Syslog Using Loggly to view individual log items
Everyone Uses Something Like?
gareth rushgrove | morethanseven.net
Use Something Like This Too
gareth rushgrove | morethanseven.net
What is Ganglia?
gareth rushgrove | morethanseven.net
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids.ganglia.sourceforge.net
“
Example: vagrantbox.es
gareth rushgrove | morethanseven.net
Load Averages
gareth rushgrove | morethanseven.net
CPU
gareth rushgrove | morethanseven.net
Aggregate Graphs
gareth rushgrove | morethanseven.net
Across Entire Cluster
gareth rushgrove | morethanseven.net
Predicting When Your System Will Fail
gareth rushgrove | morethanseven.net
A strategy for anticipating future workloads of your computers, with the aim of creating a computing environment that can handle future workloadIBM
“
Disk Space
gareth rushgrove | morethanseven.net
Monitoring Your Application
gareth rushgrove | morethanseven.net
86.26.7.33 - - [26/Mar/2011:20:39:52 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.1" 200 2081 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-us) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5970 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"86.26.7.33 - - [26/Mar/2011:20:39:53 +0000] "GET / HTTP/1.0" 200 5466 "-" "FunkLoad/1.14.0"
Web Server Logs
gareth rushgrove | morethanseven.net
Logster from Etsy
gareth rushgrove | morethanseven.net
Tail a log file and filter each line to generate metrics that can be sent tocommon monitoring packages.
Options: -p METRIC_PREFIX, --metric-prefix=METRIC_PREFIX Add prefix to all published metrics. This is for people that may multiple instances of same service on same host. --gmetric-options=GMETRIC_OPTIONS Options to pass to gmetric such as -d 180 -c /etc/ganglia/gmond.conf (default). These are passed directly to gmetric. --graphite-host=GRAPHITE_HOST Hostname and port for Graphite collector, e.g. graphite.example.com:2003 -s STATE_DIR, --state-dir=STATE_DIR Where to store the logtail state file. Default location /var/run -d, --dry-run Parse the log file but send stats to standard output. -D, --debug Provide more verbose logging for debugging.
Logster
gareth rushgrove | morethanseven.net
logster SampleGangliaLogster /../access.log
Logster Command Line
gareth rushgrove | morethanseven.net
HTTP Responses with a 2xx Status Code
gareth rushgrove | morethanseven.net
The Ganglia Metric Client (gmetric) announces a metricon the list of defined send channels defined in a configuration file
Usage: gmetric [OPTIONS]... -V, --version Print version and exit -c, --conf=STRING The configuration file to use for finding send channels (default='/etc/ganglia/gmond.conf') -n, --name=STRING Name of the metric -v, --value=STRING Value of the metric -t, --type=STRING Either string|int8|uint8|int16|uint16|int32|uint32|float|double -u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius (default='') -s, --slope=STRING Either zero|positive|negative|both (default='both') -x, --tmax=INT The maximum time in seconds between gmetric calls (default='60') -d, --dmax=INT The lifetime in seconds of this metric (default='0') -S, --spoof=STRING IP address and name of host/device (colon separated) we are spoofing (default='') -H, --heartbeat spoof a heartbeat message (use with spoof option)
Gmetric
gareth rushgrove | morethanseven.net
Gmetric Scripts for Common Applications
gareth rushgrove | morethanseven.net
gmetric -n sales -v 200 -t float
Gmetric Command Line
gareth rushgrove | morethanseven.net
Our Custom Metric in Ganglia
gareth rushgrove | morethanseven.net
import subprocess
from bottle import route, run, abort, default_app
@route('/:name/:value')def index(name, value): try: cmd = 'gmetric -n %s -v %s -t float' % (name, value) subprocess.check_call( cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return "Success: %s" % cmd except subprocess.CalledProcessError: abort(500, "Error")
app = default_app()
Gmetric HTTP Interface
gareth rushgrove | morethanseven.net
http://../sales/200
Gmetric URL
gareth rushgrove | morethanseven.net
import subprocessimport SocketServer
class GmetricTCPHandler(SocketServer.BaseRequestHandler):
def handle(self): self.data = self.request.recv(1024).strip() items = self.data.split(' ') try: cmd = 'gmetric -n %s -v %s -t float' % (items[0], items[1]) subprocess.check_call( cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return "Success: %s" % cmd except Exception: return "Error"
if __name__ == "__main__": HOST, PORT = "0.0.0.0", 8001 server = SocketServer.TCPServer((HOST, PORT), GmetricTCPHandler) server.serve_forever()
Gmetric TCP Interface
gareth rushgrove | morethanseven.net
sales 200
Gmetric TCP
gareth rushgrove | morethanseven.net
Syslog
gareth rushgrove | morethanseven.net
Syslog is a standard for logging program messages. It allows separation of the software that generates messages from the system that stores them and the software that reports and analyzes them.Wikipedia
“
Loggly - Logging as a Service
gareth rushgrove | morethanseven.net
View logs
gareth rushgrove | morethanseven.net
Logstash
gareth rushgrove | morethanseven.net
Graylog2
gareth rushgrove | morethanseven.net
Other Things You Could Monitor
gareth rushgrove | morethanseven.net
- Database table sizes
- Cache hits- Time taken for test runs
- Codebase size
- Signups, sales, subscriptions
- Twitter followers
What Next?
gareth rushgrove | morethanseven.net
- Wikipedia http://ganglia.wikimedia.org/
- Install Ganglia deb and rpm packages available
- Add system metrics web servers, databases
- Add business metrics users, sales, tweets
- Try Loggly or at least investigate syslog
gareth rushgrove | morethanseven.net
Reading
CBGN11
2 months free on FreeAgent
gareth rushgrove | morethanseven.net
Questions?
gareth rushgrove | morethanseven.net http://flickr.com/photos/psd/102332391/