monitoring your api

32

Upload: andres-f-vargas

Post on 09-Feb-2017

495 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Monitoring your API
Page 2: Monitoring your API

WHAT IS THIS TALK ABOUT?

• Passive monitoring with graphite (collect statistics).

• What metrics to monitor.

• What tools.

• Graph examples.

Page 3: Monitoring your API

ASSUMPTIONS

• You are using Nginx as a proxy for your API.

• You are using Ubuntu (but works in other Linux distributions).

• You’ll be using graphite to store metrics sent by collectl for system metrics and logster for Nginx logs.

Page 4: Monitoring your API

WHAT TO MONITOR?“The 15 Essential Nginx Metrics to Monitor” by Scalyr https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide•Requests per second•Response time•Active connections•Connection backlog queue•Response codes•Process open file handlers•Process state*

•Server status*•Server load average•Server network usage•Server disk space•Hosting provider status*•DNS expiration*•SSL certificate expiration*•User activity*

* Not the kind of thing you would measure so not talking about them in this talk

Page 5: Monitoring your API

WHAT TO MONITOR?• “The USE Method” by Brendan Gregg http://www.brendangregg.com/

usemethod.html

• Methodology for analyzing the performance of any system.

• Summarized as: “For every resource, check utilization, saturation, and errors.”

• Consider software a resource as well

• “USE Method: Rosetta Stone of Performance Checklists” by Brendan Gregg http://www.brendangregg.com/USEmethod/use-rosetta.html

Page 6: Monitoring your API

WHAT TO MONITOR ?Utilization Saturation Errors

App Performance Response time, # Requests — 5xx code

Nginx Connections Active Accepted - Handled —

Open file descriptors # open files — —

CPU % Util Run queue size —

Network Rx or Tx / Max Dropped Errors

Memory Used Swap —

Disk % Util Wait time and queue length —

Page 7: Monitoring your API

WHAT TOOLS?Utilization Saturation Errors

App Performance Response time, # Requests — 5xx code

Nginx Connections Active Accepted - Handled —

Open file descriptors # open files — —

CPU % Util Run queue size —

Network Rx or Tx / Max Dropped Errors

Memory Used Swap —

Disk % Util Wait time and queue length —

Page 8: Monitoring your API

WHAT TOOLS? COLLECTL• Created by HP

• Low overhead

• Available in all major Linux distributions

• Measure a rich set of metrics

• Store locally and exports to ganglia and graphite, custom imports and exports can be added

• Problem: doesn’t export all metrics to graphite

Page 9: Monitoring your API

WHAT TOOLS? COLLECTL• Install:

$ sudo apt-get install collectl libwww-curl-perl

• Patch graphite export (to fix metrics that aren't included by default):$ wget graphite.patch https://gist.githubusercontent.com/andphe/

2a08eab7fb4148d33888/raw/5d416d8faa5a9ca535cd5e062622d712f74c6f11/

graphite.patch

$ sudo patch -p0 /usr/share/collectl/graphite.ph graphite.patch

• Install nginx import module$ git clone https://github.com/andphe/collectl-imports.git

$ cd collectl-imports

$ sudo cp nginx.ph /usr/share/collectl/

Page 10: Monitoring your API

WHAT TOOLS? COLLECTL• Configure (/etc/colletcl.conf):

DaemonCommands = -i 10 -s+YZDN --netopts e --import nginx,s=http,h=localhost,p=80,u=nginx_status --export graphite,<ip address>,p=.collectl

• Enable Nginx status (/etc/nginx/sites-available/default)location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; deny all; }

• Restart:$ sudo /etc/init.d/nginx reload $ sudo /etc/init.d/collectl restart

Page 11: Monitoring your API

WHAT TOOLS? LOGSTER• Created by Etsy

• Export to ganglia, graphite, statsd, cloudwatch, nagios

• Few dependencies

• New parsers can be added

• 1 minute resolution

• Problem: only sends requests / sec per response code

Page 12: Monitoring your API

WHAT TOOLS? LOGSTER• Nginx allows to log the request time via $request_time

• I created a parser for logster that takes advantage of $request_time

• Sends percentiles and max

• DOESN’T USE AVERAGES

• Sends total of requests per responde code

Page 13: Monitoring your API

WHAT TOOLS? LOGSTER• Why a new parser that doesn't use averages:“#LatencyTipOfTheDay: Average (def): a random number that falls somewhere between the maximum and 1/2 the median. Most often used to ignore reality.” by Gil Tene http://latencytipoftheday.blogspot.com.co/2014/06/latencytipoftheday-average-random.html

Page 14: Monitoring your API

WHAT TOOLS? LOGSTER

• Why a new parser that doesn't use averages:“#LatencyTipOfTheDay: If you are not measuring and/or plotting Max, what are you hiding (from)?” by Gil Tene http://latencytipoftheday.blogspot.com.co/2014/06/latencytipoftheday-if-you-are-not.html

Page 15: Monitoring your API

WHAT TOOLS? LOGSTER

• Why a new parser that doesn't use averages:More resources about response times in web apps:http://www.infoq.com/presentations/latency-pitfalls by Gil Tene https://vimeo.com/104129953 by Andre Arko

Page 16: Monitoring your API

WHAT TOOLS? LOGSTER

• Install: $ sudo apt-get install logtail $ git clone https://github.com/etsy/logster.git $ cd logster && sudo python setup.py install

• Configure (add a cron job):* * * * * logster --output=graphite —graphite-host=<ip address>:2003 -p “<hostname>.logster.api" NginxLogster /var/log/nginx/access.log 2>&1 > /tmp/logster_out.txt

Page 17: Monitoring your API

WHAT TOOLS? LOGSTER

• Install NginxParser (copy it to parsers folder) $ git clone https://github.com/andphe/logster-parsers.git $ cd logster-parsers $ sudo cp NginxParser.py /usr/local/lib/python2.7/dist-packages/logster-0.0.1-py2.7.egg/logster/parsers/

• Configure Nginx to log the request time: log_format request_time '$remote_addr - $remote_user [$time_local] ' '"$request" $status "$request_time" $bytes_sent ' '"$http_referer" "$http_user_agent"'; access_log /var/logs/nginx/access.log request_time;

Page 18: Monitoring your API

GRAPH EXAMPLES: APP PERFORMANCE

/render?from=-15minutes&until=now&width=400&height=300&target=aliasByNode(<hostname>.logster.api.requests.*%2C%204)&lineMode=staircase&areaAlpha=0.8&title=App%20Performance%20(%23%20Requests%2C%20HTTP%20Codes)&areaMode=all

Page 19: Monitoring your API

GRAPH EXAMPLES: APP PERFORMANCE

/render?from=-15minutes&until=now&width=400&height=300&target=aliasByNode(<hostname>.logster.api.latency.*%2C4)&areaAlpha=0.8&title=App%20Performance%20(Response%20Time)

Page 20: Monitoring your API

GRAPH EXAMPLES: CPU/render?from=-1hours&until=now&width=400&height=300&target=exclude(aliasByNode(<hostname>.collectl.cputotals.*%2C%203)%2C%20'idle')&title=CPU%20(Utilization%20%25)&areaMode=all&areaAlpha=0.8

Page 21: Monitoring your API

GRAPH EXAMPLES: CPU/render?from=-1hours&until=now&width=400&height=300&target=alias(<hostname>.collectl.ctxint.run%2C%20'Run%20queue')&title=CPU%20(Saturation%20Tasks)&areaAlpha=0.8&areaMode=all

Page 22: Monitoring your API

GRAPH EXAMPLES: MEMORY/render?from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collectl.meminfo.used%2C%203)&title=Memory%20(Utilization%20KB)&vtitle=%20&areaMode=stacked&areaAlpha=0.8

Page 23: Monitoring your API

GRAPH EXAMPLES: MEMORY/render?from=-1hours&until=now&width=400&height=300&target=alias(<hostname>.collectl.swapinfo.used%2C%20'swap%20used')&title=Memory%20(Saturation%20KB)&areaMode=all&areaAlpha=0.8

Page 24: Monitoring your API

GRAPH EXAMPLES: NETWORK/render?from=-1hours&until=now&width=400&height=300&target=alias(scale(highestMax(<hostname>.collectl.netinfo.kb*.eth0%2C%201)%2C%200.00008)%2C%20'eth0')&title=Network%20(Utilization%20%25%2C%20100Mb)&areaMode=all&areaAlpha=0.8

Page 25: Monitoring your API

GRAPH EXAMPLES: NETWORK/render?from=-1hours&until=now&width=400&height=300&target=alias(scale(<hostname>.collectl.netinfo.drpout.eth0%2C-1)%2C'eth0%20out')&target=alias(<hostname>.collectl.netinfo.drpin.eth0%2C'eth0%20in')&title=Network%20(%20Saturation%2C%20Drops)&areaMode=all&areaAlpha=0.8

Page 26: Monitoring your API

GRAPH EXAMPLES: NETWORK/render?from=-1hours&until=now&width=400&height=300&target=alias(scale(<hostname>.collectl.netinfo.errout.eth0%2C-1)%2C'eth0%20out')&target=alias(<hostname>.collectl.netinfo.errin.eth0%2C'eth0%20in')&title=Network%20(%20Errors)&areaMode=all&areaAlpha=0.8

Page 27: Monitoring your API

GRAPH EXAMPLES: DISK/render?from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collectl.diskinfo.util.sda%2C%204)&title=Disk%20(Utilization%20%25)&areaMode=all&areaAlpha=0.8

Page 28: Monitoring your API

GRAPH EXAMPLES: DISK/render?from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collectl.diskinfo.quelen.sda%2C%204)&title=Disk%20(Saturation%2C%20Queue%20Len%20%2F%20sec)&areaMode=all&areaAlpha=0.8

Page 29: Monitoring your API

GRAPH EXAMPLES: DISK/render?from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collectl.diskinfo.wait.sda%2C%204)&title=Disk%20(Saturation%2C%20Time%20wait%20%2F%20sec)&areaMode=all&areaAlpha=0.8

Page 30: Monitoring your API

GRAPH EXAMPLES: NGINX/render?from=-1hours&until=now&width=400&height=300&target=aliasByNode(<hostname>.collectl.ngix.conn.active%2C%204)&title=Nginx%20(Utilization%2C%20Connections)&areaMode=all&areaAlpha=0.8

Page 31: Monitoring your API

GRAPH EXAMPLES: NGINX/render?from=-1hours&until=now&width=400&height=300&target=alias(diffSeries(<hostname>.collectl.ngix.conn.accepted%2C%20<hostname>.collectl.ngix.conn.handled)%2C%20'dropped')&title=Nginx%20(Saturation%2C%20Connections)&areaMode=all&areaAlpha=0.8

Page 32: Monitoring your API

THANK YOU!QUESTIONS & ANSWERS

@[email protected]