the evolution of site reliability engineering · your side of the boat.’” –fred kofman...
TRANSCRIPT
![Page 1: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/1.jpg)
The Evolution of Site Reliability Engineering
Ben PurgasonDirector, Site Reliability
![Page 2: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/2.jpg)
The Founding Principles
Site Up Empower Developer Ownership
Operations is an Engineering Problem
![Page 3: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/3.jpg)
• Incident management
• Purely reactive
• Keeps the company alive one more day
The Firefighter
![Page 4: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/4.jpg)
SRE - SWE Roles and Responsibilities
• Incident Management
• Automating manual fire suppression
• Seeking to understand the stack
• Monitoring
• Alerting
SRE (Firefighter) SWE• Feature / Product Development
• Escalation Point for SRE
![Page 5: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/5.jpg)
Tools SREThe Firefighters
• Something was always broken, literally
• GCN, post mortem, action Items, repeat
![Page 6: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/6.jpg)
If you can do justtwo things…
“If you’re going through hell, keep going.” – Winston Churchill
• Every day is Monday in Operations
• What gets measured gets fixed
![Page 7: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/7.jpg)
• Change control
• Reactive towards SWE plans
• Protect “our” site from “them”
The Gatekeeper
![Page 8: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/8.jpg)
SRE - SWE Roles and Responsibilities
• Incident Management
• Deployments
• Change Control
• Monitoring
• Alerting
SRE (Gatekeeper) SWE• Feature / Product Development
• Request Changes from SRE
• Escalation Point for SRE
![Page 9: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/9.jpg)
Tools SREThe Gatekeepers
• Human gatekeeping doesn’t scale
• Service Guard, dividing users since 2014
![Page 10: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/10.jpg)
If you can do justtwo things…
“There is no such thing as ‘the hole is in your side of the boat.’” – Fred Kofman
• Attack the problem, not the person
• No human gatekeepers. Build automated gatekeepers that use mutually agreed upon data.
![Page 11: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/11.jpg)
C e n t e r o f G r a v i t y -
The principle thing or activity that must be keptin balance or under control for an org to operate
![Page 12: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/12.jpg)
C e n t e r o f G r a v i t y -
“The ability to influence and beinfluenced by our partner teams”
![Page 13: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/13.jpg)
• Creating a site up culture
• Reactive towards SWE plans
• Rebuilds trusted relationshipsThe Advocate
![Page 14: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/14.jpg)
SRE - SWE Roles and Responsibilities
• Incident Management
• Monitoring and Alerting
• Partnering in the creation of “gate keeping data”
• Developing systems that empower ownership
• Relentless propagation of Site Up culture
SRE (Advocate) SWE• Feature / Product Development
• Escalation Point for SRE
• Monitoring and Alerting
• Partnering in the creation of “gate keeping data”
![Page 15: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/15.jpg)
Tools SREThe Advocates
• Site up helps everyone.
• Help us help you.
• How do you want to spend your time?
![Page 16: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/16.jpg)
If you can do justtwo things…
“Consistency over time equals trust.” – Jeff Weiner
• Be an advocate – make an advocate
• Do not insulate, share the pain
![Page 17: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/17.jpg)
• Empowers intelligent risk
• Proactive, joint planning with SWE
• Collaborating to magnify impactThe Partner
![Page 18: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/18.jpg)
SRE - SWE Roles and Responsibilities
• Incident Management
• Monitoring and Alerting
• Building products for reliability and scale
• Relentless propagation of Site Up culture
SRE (Partner) SWE
• Incident Management
• Monitoring and Alerting
• Building products for reliability and scale
• Feature / Product Development
![Page 19: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/19.jpg)
Tools SREThe Partners
• All teams plan together with partners
• Contributions to core libraries
• Contributions across org boundaries
![Page 20: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/20.jpg)
If you can do justthree things…
“We should operate on what needs to get done, not on an org structure!” – Dan Grillo
• Unified SRE and SWE planning
• Unified SRE and SWE priorities
• Contribute where it counts
![Page 21: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/21.jpg)
• Reliability throughout software lifecycle
• Proactive, one plan for SRE+SWE
• Everyone has the same job.
The Engineer
![Page 22: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/22.jpg)
SRE Evolution
The Partner
The Gatekeeper
The Advocate
The Engineer
The Firefighter
![Page 23: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/23.jpg)
Want to have a conversation?
![Page 24: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/24.jpg)
Thank you
![Page 25: The Evolution of Site Reliability Engineering · your side of the boat.’” –Fred Kofman •Attack the problem, not the person •No human gatekeepers. Build automated gatekeepers](https://reader033.vdocuments.us/reader033/viewer/2022042117/5e94eee34bf2bf6ea86ac1e2/html5/thumbnails/25.jpg)
+