ignite (10m) how to not burn out your monitoring team
TRANSCRIPT
How to not burn out your production team
Gil Zellner (CloudifyDev at Gigaspaces)
Twitter: @Heathenaspargus
cost of hiring new employee is 1.5-3x their monthly salary
@Heathenaspargus
https://www.ergoflex.co.uk/blog/category/sleep-research/sleeponomics-could-sleep-deprivation-be-the-real-reason-politicians-make-bad-decisions
@Heathenaspargus
Easy (days) Intermediate (months)
Hard (years)
- no changes to infrastructure
- just policy
- Small changes to apps
- logging
- light automation
- Design for better operability
- long term
@Heathenaspargus
Mandatory Half day-off after night production issue
@Heathenaspargus
Allocate weekly time to resolve or automate issues that kept us up at night
@Heathenaspargus
Knowledge Matrix
Deploy System Mobile Link Backend
Gil V V
Karen V V
Ari V V
@Heathenaspargus
Easy (days) Intermediate (months)
Hard (years)
- no changes to infrastructure
- just policy
- Small changes to apps
- logging
- light automation
- Design for better operability
- long term
@Heathenaspargus
solution: alert only things that meet the following criteria:
1) Alert on symptoms, not suspected "causes"
2) Actionable
3) Business breaking
@Heathenaspargus
Facebook Auto Remediation
https://www.facebook.com/notes/facebook-engineering/making-facebook-self-healing/10150275248698920
@Heathenaspargus
Easy (days) Intermediate (months)
Hard (years)
- no changes to infrastructure
- just policy
- Small changes to apps
- logging
- light automation
- Design for better operability
- long term
@Heathenaspargus