tek12: graphing real-time performance with graphite
TRANSCRIPT
![Page 1: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/1.jpg)
Graphing real-time performance with
GraphiteNeal Anders - https://joind.in/650
![Page 2: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/2.jpg)
whoami
Neal AndersSenior Software Engineer at Infobloxhttp://github.com/nanderoohttp://neal-anders.com@nanderoo
![Page 3: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/3.jpg)
shameless plug
Infoblox is working on some cool stuff...- DNS, DHCP, IPAM, NCCM- IPv6 Center of Excellence- IF-Map / DNSSec- Hiring (sales, services, support, engineering)
![Page 4: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/4.jpg)
disclaimer
These thoughts and opinions are my own, and not of my employer, bla bla bla...
![Page 5: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/5.jpg)
whois $USER
Quick poll:- Designers- Developers- Sys-Admins- Networking- Management- Other...?
![Page 6: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/6.jpg)
overview
What will we cover:- What is Graphite?- What data to capture- Chart interpretation
![Page 7: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/7.jpg)
but why
I worked at a place with major scale fail- boxed vs service- 100's of servers in multiple datacenters- manual processes, shell scripts- no insight into the app, infrastructure- n-tier architecture- on-call duties- needed therapy, got it, didn't help
![Page 8: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/8.jpg)
what is graphite
- Scalable real-time graphing system- 3 main components: - Web front-end, graphite - Processing backend, carbon - Database, whisper- Python based*
* It's good to learn other languages
![Page 9: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/9.jpg)
what is graphite
Setup / Documentation:- Easy to setup- Decent documentation- API and CLI access
![Page 10: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/10.jpg)
what is graphite
What does it capture?- Numeric time-series data... point some.data.path
value 3.2
timestamp 1337690041 (epoch)
![Page 11: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/11.jpg)
what is graphite
How much data?- configurable- precision - retention period- aggregation
![Page 12: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/12.jpg)
what is graphite
![Page 13: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/13.jpg)
what is graphite
Notes / gotchas:- Scales horizontally- Heavy on disk-io- Fault tolerance- Data loss- Precision or Storage Space / io
![Page 14: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/14.jpg)
what data to capture
...so what information should we capture? ..how detailed do we get? ..and does it have historical relevance? ..are just a few key metrics enough?
![Page 15: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/15.jpg)
what data to capture
![Page 16: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/16.jpg)
what data to capture
Thoughts on maximum vs. minimum:- What information do you need to capture?- Application Data (yes!)- System Data: cpu, disk-io, mem usage- Network: Connections? Latency? Packet loss?- Fine-grained vs summary and aggregate?
![Page 17: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/17.jpg)
what data to capture
In your app:- function / method / calculation time- template / content generation- database query execution- Internal and 3rd-party API calls- queue sizes, processing times- A/B testing?
![Page 18: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/18.jpg)
what data to capture
From the systems:- cpu- disk usage- io (disk, network interface)- memory / paging / swap- file handles- log entries
![Page 19: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/19.jpg)
what data to capture
At the network level:- connection count- socket state- qos levels- firewall stats- cdn / cache response- 3rd party status
![Page 20: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/20.jpg)
chart interpretation
...it's like reading tea leaves... ...domains of knowledge leave gaps... ...thats not my job... ...forest through the trees...
![Page 21: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/21.jpg)
chart interpretation
So what are we looking for:- normality *- deviations- jitters- historical performance- double rainbows * not present per Cal's keynote
![Page 22: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/22.jpg)
chart interpretation
Because at 3am when you get paged... Wouldn't it be great to correlate the site going down... due to swapping... because of high memory usage... thanks to that code that got pushed... that had that change to how you processed row results from a large database query.
![Page 23: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/23.jpg)
chart interpretation
Or that change window that just happened... Where the security folks made some config changes to one of the firewalls.. that is now blocking your outbound API calls.. just from some app servers in one of the datacenters..
![Page 24: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/24.jpg)
chart interpretation
What about that new kernel that fixes a memory leak... Can you compare side by side, and with historical context, what that looks like? What about a physical machine vs a virtual one?
![Page 25: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/25.jpg)
chart interpretation
Do we need to retune our load-balancers, app servers, or database replication? Does higher site traffic over the past few weeks show signs of strain? Did that cache layer we add help any? Is historical data choking once-fast pages?
![Page 26: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/26.jpg)
demo
wordpress example
![Page 27: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/27.jpg)
some final thoughts
- come full circle, stats back in- this is one solution, there are others (statsd)- part of a larger tool bag- implement before big changes- establish a reference / baseline- suitable for dev, qa, and production- make implementing data capture easy
![Page 28: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/28.jpg)
resources
http://graphite.wikidot.comhttp://wordpress.orghttp://memgenerator.nethttp://www.flickr.com/groups/webopsviz/ ..more resources available online..
![Page 30: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/30.jpg)
fin
Thank you.
![Page 31: Tek12: Graphing real-time performance with Graphite](https://reader033.vdocuments.us/reader033/viewer/2022052620/557622b0d8b42a4e1c8b4db9/html5/thumbnails/31.jpg)
Bonus
2001:1868:ad01:1::33