graphite at citygrid - la devops april 2014
DESCRIPTION
High-level description of CityGrid's use of Graphite for collecting/displaying metrics, along with some interesting use-cases.TRANSCRIPT
![Page 1: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/1.jpg)
Graphite at CityGridif you can’t measure it, you can’t fix it
Wil HeitritterDirector, Tech Ops
Los Angeles DevOps2014/04/28
![Page 2: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/2.jpg)
Magnum esse solem philosophus probabit, quantus sit mathematicus
-Seneca
![Page 3: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/3.jpg)
Objectives
- Introduce Graphite to new users
- Show what we like, what we hate
- Present some interesting use-cases
- Generate discussion
![Page 4: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/4.jpg)
Before Graphite
Ganglia
• Predictable interface
• Text “metrics” to store versions
• Slow
• Couldn’t pick and choose metrics to see
![Page 5: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/5.jpg)
Why ganglia sucked
- Clusters had to be pre-configured
- Multicast vs. Unicast
- Data Retention
- Static Web Interface (can’t pick and choose)
- Static Host List
![Page 6: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/6.jpg)
What did we think wanted?
Ease of adding metrics
Ease of sending metrics
Powerful metric display
Retain ganglia-style cluster dashboards
Long-term configurable metric retention
![Page 7: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/7.jpg)
Graphite!
![Page 8: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/8.jpg)
What is Graphite?
a highly scalable real-time graphing system
which collects numeric time-series data
is managed by carbon
and stored as whisper files
and visualized through web interfaces
or queried via the API
http://graphite.wikidot.com/
![Page 9: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/9.jpg)
Graphite: what we like
Sending metrics is simple
Retrieving metrics is simple
Dashboard creation and sharing… is simple
Many functions()
120MM+ metric values received daily
Backfilling past metrics is simple
Expandable - different frontends
![Page 10: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/10.jpg)
Graphite: what sucks
Dashboard ownership/promotion
No ganglia-like standard dashboard
Data retention… is NOT as simple as we thought
![Page 11: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/11.jpg)
CityGrid’s Graphite
Implementation
![Page 12: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/12.jpg)
Metric NamingBusiness Metrics
- These are metrics that are not specific to a specific server
- Format: business.${hierarchical}.${path}.${here}.$metric
- Example: business.ec2.testaccount.us-east-1a.OnDemand.running.m2.4xlarge
![Page 13: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/13.jpg)
Metric Naming
Server Metrics
- These metrics are specific to a particular server (just like ganglia)
- Format: servers.${class}.${f_q_d_n}.${metric}
- Example: servers.rvw.aws1prdrvw1_subdom_cityg_com.LW_api_reviews_QPS
![Page 14: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/14.jpg)
Sending metrics
Sending directly from metric scripts
- /etc/graphite.conf
- May need to spread out sending if in volume
Collecting from gmond every minute
- Metrics are spread out to prevent spiking
- False data (gmond acts as a cache)
![Page 15: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/15.jpg)
Impact of staggered sending
![Page 16: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/16.jpg)
Sending is simply...
echo $metric $value $timestamp | nc $relay $port
![Page 17: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/17.jpg)
Performance
carbon-cache/carbon-relay
SSD
replication within minutes
![Page 18: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/18.jpg)
Maintenance
Changing retention
- whisper-auto-resize.py
Filling holes
- whisper-fill $source $destination
Backups
- Dashboards
- Metrics
![Page 19: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/19.jpg)
Graphite Use-Cases
![Page 20: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/20.jpg)
Single Metric
![Page 21: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/21.jpg)
Combined Metrics
![Page 22: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/22.jpg)
Key Metrics Dashboard
Examples of Key Metrics
- QPS
- Processing Time (Max/Mean/Distribution)
- Metrics about sub-requests
- Network usage
- CPU/load
![Page 23: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/23.jpg)
Key Metrics Dashboard
![Page 24: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/24.jpg)
![Page 25: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/25.jpg)
Nagios Integration
check_graphite_target!highestMax(servers.mai.@[email protected]_map_return_code_5*_ratio, 1
)!5!10
![Page 26: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/26.jpg)
How about Pie Charts?
![Page 27: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/27.jpg)
![Page 29: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/29.jpg)
What NOT to do
![Page 30: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/30.jpg)
Trying it out for yourself
![Page 31: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/31.jpg)
Quick Setup
Install & Start# pip install https://github.com/graphite-project/ceres/tarball/master
# pip install whisper
# pip install carbon
# pip install graphite-web
start it up...
send it a metric:echo business.test.metric1 1 `date “+%s”` | nc localhost 2003
OK, it’s almost that easy...
![Page 32: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/32.jpg)
Discussion
![Page 33: Graphite at CityGrid - LA DevOps April 2014](https://reader034.vdocuments.us/reader034/viewer/2022051314/554f92ecb4c9052a518b54a3/html5/thumbnails/33.jpg)