devops incident handling - making friends not enemies

52
How to win friends when handling outages and downtime David Mytton London DevOps - Oct 2014 blog.serverdensity.com

Upload: server-density

Post on 27-Jun-2015

351 views

Category:

Technology


0 download

DESCRIPTION

David Mytton CEO of Server Density presented this talk to the DevOps Meetup in London. It takes you through how to handle DevOps incidents, outages and downtime -- and more specifically how to make friends, not enemies in the process.

TRANSCRIPT

Page 1: DevOps Incident Handling - Making friends not enemies

How to win friends when handling outages and downtime

David MyttonLondon DevOps - Oct 2014

blog.serverdensity.com

Page 2: DevOps Incident Handling - Making friends not enemies

David Mytton

Page 3: DevOps Incident Handling - Making friends not enemies

Server monitoring, cloud management, dashboards and alerting

serverdensity.com

Page 4: DevOps Incident Handling - Making friends not enemies

Slides: twitter.com/davidmytton

Page 5: DevOps Incident Handling - Making friends not enemies

Let’s talk about downtime

Page 6: DevOps Incident Handling - Making friends not enemies

2013 Spend: ~$5bn

Page 7: DevOps Incident Handling - Making friends not enemies

2013 Spend: ~$6bn

Page 8: DevOps Incident Handling - Making friends not enemies

2013 Spend: ~$4bn

Page 9: DevOps Incident Handling - Making friends not enemies

You will have downtime

How much do you spend?

Page 10: DevOps Incident Handling - Making friends not enemies
Page 11: DevOps Incident Handling - Making friends not enemies

Preparation

Page 12: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Primary?

Page 13: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Primary?

● Secondary?

Page 14: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Primary?

● Secondary?

● Reachability - Tube, 3G/4G (edge?!), Do Not Disturb mode, at the gym, family emergency, system updates

Page 15: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Off call

Page 16: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Off call

● Rotations

Page 17: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Off call

● Rotations

● Illness

Page 18: DevOps Incident Handling - Making friends not enemies

Preparation - On Call

● Off call

● Rotations

● Illness

● Work the next day?

Page 19: DevOps Incident Handling - Making friends not enemies

Preparation - Documentation

Page 20: DevOps Incident Handling - Making friends not enemies

Preparation - Documentation

● Searchable

Page 21: DevOps Incident Handling - Making friends not enemies

Preparation - Documentation

● Searchable

● Easy to edit

Page 22: DevOps Incident Handling - Making friends not enemies

Preparation - Documentation

● Searchable

● Easy to edit

● Independent of your infrastructure

Page 23: DevOps Incident Handling - Making friends not enemies

Preparation - Documentation

● Searchable

● Easy to edit

● Independent of your infrastructure

● Up to date

Page 24: DevOps Incident Handling - Making friends not enemies
Page 25: DevOps Incident Handling - Making friends not enemies

Preparation - Key Info

Page 26: DevOps Incident Handling - Making friends not enemies

Preparation - Key Info

● Team contacts

Page 27: DevOps Incident Handling - Making friends not enemies

Preparation - Key Info

● Team contacts

● Key vendor contacts

Page 28: DevOps Incident Handling - Making friends not enemies

Preparation - Key Info

● Team contacts

● Key vendor contacts

● Credentials to key systems

Page 29: DevOps Incident Handling - Making friends not enemies

Unexpected failures

Page 30: DevOps Incident Handling - Making friends not enemies

Unexpected failures

● Communication systems

Page 31: DevOps Incident Handling - Making friends not enemies

Unexpected failures

● Communication systems

● Network connectivity

Page 32: DevOps Incident Handling - Making friends not enemies

Unexpected failures

● Communication systems

● Network connectivity

● Access to support

Page 33: DevOps Incident Handling - Making friends not enemies

ALERT!

Page 34: DevOps Incident Handling - Making friends not enemies

ALERT!

1. Load up incident response checklist

Page 35: DevOps Incident Handling - Making friends not enemies

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

Page 36: DevOps Incident Handling - Making friends not enemies

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

3. Log into Ops War Room

Page 37: DevOps Incident Handling - Making friends not enemies

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

4. Public status post

3. Log into Ops War Room

Page 38: DevOps Incident Handling - Making friends not enemies

ALERT!

1. Load up incident response checklist

2. Log incident in JIRA

4. Public status post

5. Initial investigation

3. Log into Ops War Room

Page 39: DevOps Incident Handling - Making friends not enemies

Key response principles

Page 40: DevOps Incident Handling - Making friends not enemies

Key response principles

● Log everything

Page 41: DevOps Incident Handling - Making friends not enemies

Key response principles

● Log everything

● Frequent public status updates

Page 42: DevOps Incident Handling - Making friends not enemies

Key response principles

● Log everything

● Frequent public status updates

● Gather the team

Page 43: DevOps Incident Handling - Making friends not enemies

Key response principles

● Log everything

● Frequent public status updates

● Gather the team

● Escalate!

Page 44: DevOps Incident Handling - Making friends not enemies

Postmortem

Page 45: DevOps Incident Handling - Making friends not enemies

Postmortem

● Within a few days

Page 46: DevOps Incident Handling - Making friends not enemies

Postmortem

● Within a few days

● Tell the story

Page 47: DevOps Incident Handling - Making friends not enemies

Postmortem

● Within a few days

● Tell the story

● Provide technical detail

Page 48: DevOps Incident Handling - Making friends not enemies

Postmortem

● Within a few days

● Tell the story

● Provide technical detail

● Explain what failed and why

Page 49: DevOps Incident Handling - Making friends not enemies

Postmortem

● How it’s going to be fixed

Page 50: DevOps Incident Handling - Making friends not enemies

stspg.io/ZDC

Page 51: DevOps Incident Handling - Making friends not enemies

Summary

● Preparation

● Communication

● Checklists

● Documentation

● Postmortem

Page 52: DevOps Incident Handling - Making friends not enemies

どもありがとうございます

@davidmytton

[email protected]

blog.serverdensity.com

www.serverdensity.com