war games - flight training for devops @ techsummit amsterdam
TRANSCRIPT
![Page 1: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/1.jpg)
Jorge Salamero Sanz <[email protected]>
TechSummit Amsterdam 2 June 2016
War Games - Flight training for DevOps
https://joind.in/talk/2e223
![Page 2: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/2.jpg)
Jorge Salamero
@bencerillo
@serverdensity
blog.serverdensity.com
![Page 3: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/3.jpg)
How to Monitor MySQL
![Page 4: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/4.jpg)
● Infrastructure automation
● Configuration automation
● Continuous testing
● Continuous deployment / delivery
● Monitoring
● Logs, error handling
● Feedback
● Human Ops
DevOps lifecycle
![Page 5: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/5.jpg)
● Humans are part of any system
● Initial design, ongoing improvements
● Maintenance
● Upgrades
● Issues, Incident response
Humans in DevOps
![Page 6: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/6.jpg)
● System issues = error rates + SLA + ...
● Human issues = alerts out of hours + interruptions + .
● System issues = Human issues
Human issues = system issues
![Page 7: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/7.jpg)
● Downtime = loss of users, reputation, revenue
● Downtime caused by unreliable systems
● Unhealthy teams reduce reliability
● Unhealthy teams = loss of users, reputation, revenue
Humans impact business
![Page 8: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/8.jpg)
● Slip
● Lapse
● Mistake
● Violation
● (Always, again, again)
Human risk
![Page 9: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/9.jpg)
What can we do?
![Page 10: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/10.jpg)
● Prepare and practice
● Respond
● Postmortem
Expect downtime
![Page 11: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/11.jpg)
Real example
(small war story, won’t be long)
![Page 12: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/12.jpg)
● Power failure to half of our servers● Automated failover unavailable
(known failure condition)● Manual DNS switch required
● Expected impact: 20 min● Actual impact: 43min
Incident example
![Page 13: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/13.jpg)
![Page 14: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/14.jpg)
Lessons learned?
![Page 15: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/15.jpg)
● Unfamiliarity with the process
● Pressure of time sensitive event(panic effect)
● Escalation introduces delays
The Human Factor
![Page 16: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/16.jpg)
Handling the Human factor
![Page 17: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/17.jpg)
● First responder, acknowledge alert
● Load incident response checklist
● Log into #ops-war-room in Slack
● Log incident into JIRA
● Begin investigation
General response process
![Page 18: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/18.jpg)
1. Extended use of checklists
Documented procedures
![Page 19: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/19.jpg)
● The “limits of human memory and attention”○ Complexity○ Stress and fatigue○ Ego
● Pilots, doctors, divers:Bruce Willis Ruins All Films(BCD, weights, releases, air, final)
Pre-flight checklists
![Page 20: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/20.jpg)
1. Extended use of checklists2. Not to follow blindly, use knowledge and experience3. Independent system4. Searchable5. List of known issues and documented workarounds/fixes
Documented procedures
![Page 21: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/21.jpg)
● Realistic replica environment
● or mock command line
● Record actions and timing
● Multiple failures
● Unexpected results
War Games
![Page 22: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/22.jpg)
Results
![Page 23: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/23.jpg)
● Team and individual test of response
● Run real commands
● Training the people
● Training the procedures
● Training the tools
Results
![Page 24: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/24.jpg)
● Increase confidence
● Reduce panic
● Better coordination
● Trust relationships
● Improves time to resolution
Humans results
![Page 25: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/25.jpg)
● Review● Suggestions for improvements● Do it again
● Scenario evolves● People forget
loop(): review and repeat
![Page 26: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/26.jpg)
What else?
![Page 27: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/27.jpg)
● Pressure of just waiting to be paged
● Trouble to sleep:7.8 days year productivity cost
● Prevent burnout
On call
![Page 28: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/28.jpg)
● Half self-interruptions
● Avg 23 minutes to resume task
● Only now actionable alerts
Alerts notifications
![Page 29: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/29.jpg)
![Page 30: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/30.jpg)
www.humanops.com
meetup.com/humanops-london/
meetup.com/humanops-sanfrancisco/
Human Ops Meetup
![Page 31: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/31.jpg)
serverdensity.com/conferences
TECHSUMMIT
Shh! Free monitoring!
![Page 32: War Games - Flight training for DevOps @ TechSummit Amsterdam](https://reader034.vdocuments.us/reader034/viewer/2022042907/5888f03b1a28ab87728b6567/html5/thumbnails/32.jpg)
www.CloudStatusApp.com
https://joind.in/talk/2e223 @bencerillo