Scaling HumansOps teams and incident management
dotScale, Paris 2015 David Mytton, CEO, Server Density
How much are you spending?
Respond
• First responder
1. Load incident response checklist
Respond
• First responder
1. Load incident response checklist
2. Log into Ops War Room
Respond
• First responder
1. Load incident response checklist
2. Log into Ops War Room
3. Log incident in JIRA
Respond
• First responder
1. Load incident response checklist
2. Log into Ops War Room
3. Log incident in JIRA
4. Begin investigation
Respond
• Key response principles
• Log everything
Respond
• Key response principles
• Log everything
• Frequent public updates
Respond
• Key response principles
• Log everything
• Frequent public updates
• Gather the team
Respond
• Key response principles
• Log everything
• Frequent public updates
• Gather the team
• Escalate!
Postmortem
• Within a few days
• Tell the story
• Appropriate technical detail
Postmortem
• Within a few days
• Tell the story
• Appropriate technical detail
• What failed, why?
Postmortem
• How it’s going to be fixed