monitoring of smartnews
Post on 16-Apr-2017
3.232 Views
Preview:
TRANSCRIPT
Self Introduction
• Nobutoshi Ogata
• Manager, Site Reliability Engineering
• @nobu666
• ❤ Whiskey, Cat, Heavy Metal
• Entrusted dev.(10y) ➡ GREE infrastructure devision(3y) ➡ Some startup(1y) ➡ SmartNews(2015/05-)
After Datadog - Phase1• OK, we can manage centrally
• But...?
• We're respecting the free development of engineers !
• Problem that the monitoring setting is leaked out "
Phase2• Introduce Interferon
• Datadog DSL
• Well, we can monitor all resources automatically
• But...?
• Unmaintained in active !
• Can't feel free to mute from Web UI "
• Lack of flexibility #
Phase3• Integrated itamae
• Our engineers were used to write chef
• Easy to override default settings
• It's asynchronous. Feel free to mute from Web UI
• Integrated dogaws @takus
• Yet another Datadog CloudWatch Integragion
• We are used in combination with itamae
Datadog tips• Easiness anomary detection
• Can't compared over 24hours until quite recently
• We request to be able to compare more longer period. Thank Datadog for implementing !
• This is a closed function. If you want to use it, ask Datadog support "
For example• Comapare Kinesis records count EWMA
pct_change(median(last_1h),1w_ago):ewma_20(avg:aws.kinesis.incoming_records{env:production,cost:smartnews} by {name}) > 50
• Compare application warn logchange(median(last_1h),1w_ago): sum:app.log.warn{env:production} by {autoscaling_group} > 25
We're hiring!Only two people on Site Reliability Engineering Team !
• スマニューのSite Reliability Engineer募集!
• http://about.smartnews.com/en/careers/
top related