Download - Sandstorm or Significant? The evolving role of situational context in incident management
![Page 1: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/1.jpg)
The evolving role of context in Incident Management
![Page 2: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/2.jpg)
Matthew BoeckmanDeveloper Advocate
Victorops.com/blog
@matthewboeckmanBackground
● 18 years on-call Ops● 15 years w/software
teams● Startup junkie● DevOps enthusiast
![Page 3: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/3.jpg)
3
What is VictorOps?
VictorOps ingests all of your alerts from your current monitoring tools and becomes the logical layer between your alerts and the people who receives them.
![Page 4: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/4.jpg)
victorops.com/IMA
![Page 5: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/5.jpg)
5
5 Phases of Incident Management
Detection
monitoring, metrics, thresholds
Response
alerting,on-call,escalation
Remediation
fixes,tickets,deployments
Analysis
postmortem,how or why,understand
Readiness
improvement,game days,learning
![Page 6: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/6.jpg)
6
Standard Incident Workflow
Detection Response Remediation
AnalysisReadiness
![Page 7: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/7.jpg)
7
Incident Management Assessment Matrix
Detection Response Remediation Analysis Preparedness
Novice
Beginner
Competent
Proficient
Expert
![Page 8: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/8.jpg)
8
Incident Management Maturity Matrix
Detection Response Remediation Analysis Preparedness
Novice
Beginner xCompetent x xProficient x x
Expert
![Page 9: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/9.jpg)
9
Self Assessment
Poll: How would you rate your overall team maturity?
A. NoviceB. BeginnerC. CompetentD. ProficientE. Expert
![Page 10: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/10.jpg)
10
The Focus Question
How can we help teams
mature their incident management practice
(Stated plainly: Make On-Call suck less)
![Page 11: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/11.jpg)
11
Situational Context
![Page 12: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/12.jpg)
12
Incident Management Key Metrics
● MTTR Mean time to Repair(MTTR)● Availability (SLA)● Ticket Volumes● Escalations● Customer Satisfaction
![Page 13: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/13.jpg)
13
Incident Management Key Metrics
![Page 14: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/14.jpg)
14
Time Spent Managing Incidents - Low Maturity
Detection Response Remediation Analysis
Readiness
Time to Repair (MTTR)
![Page 15: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/15.jpg)
15
Time Spent Managing Incidents - Medium Maturity
Detection Response Remediation Analysis
Readiness
Time to Repair (MTTR)
![Page 16: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/16.jpg)
16
Time Spent Managing Incidents - High Maturity
Detection
Response
Remediation Analysis Readiness
Time to Repair (MTTR)
![Page 17: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/17.jpg)
17
A New Core Metric
Detection
Response
Remediation Analysis Readiness
Time to Repair (MTTR)
Time to Learn(TTL)
Identify trendsCapacity planImprove infrastructure
GamedaysCross trainUpdate runbooks
![Page 18: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/18.jpg)
18
Beep Beep Beep
![Page 19: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/19.jpg)
19
Standard Incident Workflow
![Page 20: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/20.jpg)
20
Standard Diagnostic Procedure
1. Fire up the VPN
2. Navigate dashboards, find relevant section
3. Review ticket or incident history for host
4. Review Runbooks for associated host
![Page 21: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/21.jpg)
21
Common Bottlenecks to Establishing Context
● Multiple sources of record● Duplicate Runbooks or documentation● Metric overload
● New responders unfamiliar with systems
![Page 22: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/22.jpg)
22
Where Does it Hurt?
Poll: Which is the most painful problem you experience in establishing context
A. Multiple sources of recordB. Duplicate documentationC. Metric overloadD. Everything is equally on fireE. Everything is fantastic
![Page 23: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/23.jpg)
23
Beep Beep Beep
![Page 24: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/24.jpg)
24
A Tale of Two Graphs
Massive spike above expected norm
Response: Fire up the laptop and put a pot of coffee on
![Page 25: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/25.jpg)
25
A Tale of Two Graphs
Small spike for a consistently loaded box.
Response: ACK alert, go back to sleep
![Page 26: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/26.jpg)
26
This Time, with Context!
![Page 27: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/27.jpg)
27
Enhanced Contextual Workflow
![Page 28: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/28.jpg)
28
Alert Enhancements
Poll: My team is doing some enhancement of alerts today.
A. TrueB. False
![Page 29: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/29.jpg)
Many incidents can be tracked to deploys
Developer Velocity = Constant Change
Silos impair communication
29
CI/CD Exacerbates the Contextual Challenge
![Page 30: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/30.jpg)
30
A Tale of Two Incidents
![Page 31: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/31.jpg)
31
A Tale of Two Incidents
![Page 32: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/32.jpg)
32
Introducing: The Scientific Method
Make Observations (the measurement)
Ask a question (why would a webserver quit working?)
Form a hypothesis (because we just deployed?)
![Page 33: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/33.jpg)
33
The Sandstorm
![Page 34: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/34.jpg)
34
No. Do not.
![Page 35: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/35.jpg)
35
Measure Everything: the Anti-pattern
Measurements cost time and money
Busy dashboards lead to sub-concious filtering
Measurements create a natural impulse to alert
![Page 36: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/36.jpg)
36
Enhance
![Page 37: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/37.jpg)
37
Stop
![Page 38: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/38.jpg)
38
An Embarrassment of Dashboards
![Page 39: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/39.jpg)
39
Rule of Thumb
Measure much
Alert on some
Contextualize all
![Page 40: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/40.jpg)
40
Iteration is Key
Dialing in context takes time
Conduct blameless postmortems
Experiment with more and less context
Be objective in your assessment of what works
![Page 41: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/41.jpg)
41
Leverage Situational Context
Providing incident responders with context
can meaningfully impact MTTR
paying dividends in time
to move your practice forward
![Page 42: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/42.jpg)
42
The Beginning
Detection Response Remediation Analysis
Readiness
Time to Repair (MTTR)
![Page 43: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/43.jpg)
43
The Goal
Detection
Response
Remediation Analysis Readiness
Time to Repair (MTTR)
Time to Learn(TTL)
Identify trendsCapacity planImprove infrastructure
GamedaysCross trainUpdate runbooks
![Page 44: Sandstorm or Significant? The evolving role of situational context in incident management](https://reader031.vdocuments.us/reader031/viewer/2022022820/58e4bca41a28ab1c1f8b68c3/html5/thumbnails/44.jpg)
Take the IMA!http://victorops.com/ima
Questions?
44
Thank you!
Matthew Boeckman@matthewboeckman
Slides on devops.com & slideshare.com