all daydevops 2016 - turning human capital into high performance organizational capital
Post on 16-Apr-2017
563 Views
Preview:
TRANSCRIPT
Devops: Turning Human Capital into High Performance Organizational Capital
John Willis @botchagalupe
• One of the founding members of “Devopsdays” • Co-author of the “Devops Handbook”. • Author of the “Introduction to Devops” on Linux Foundation
edX. • Podcaster at devopscafe.org • Devops Enterprise Summit - Cofounder • Nine person in at Chef (VP of Customer Enablement) • Formally Director of Devops at Dell • Found of Socketplane (Acquired by Docker) • 10 Startups over 25 years
About Mehttps://github.com/botchagalupe/my-presentations
How would I describe Devops to a CEO?
How would I describe Devops to a CEO?
How would you describe Devops to a CEO?
The consequences of failure have never been greater…
The consequences of failure have never been greater…
Wanna know how?
Devops Practices and Patterns• Continuous Delivery
• Everything in version control • Small batch principle • Trunk based deployments • Manage flow (WIP) • Automate everything
• Culture • Everyone is responsible • Done means released • Stop the line when it breaks • Remove silos
13
itrevolution.com/devops-handbook
Human Capital and High Performance
Organizations
30x 200xmore frequent deployments
faster lead times
60x 168xthe change success rate
faster mean time to recover (MTTR)
2x 50%more likely to exceed profitability, market share & productivity goals
higher market capitalization growth over 3 years*
High performers compared to their peers…
Data from 2014/2015 State of DevOps Report - https://puppetlabs.com/2015-devops-report
Recent IT Performance Data is Compelling
30x 200xmore frequent deployments
faster lead times
60x 168xthe change success rate
faster mean time to recover (MTTR)
2x 50%more likely to exceed profitability, market share & productivity goals
higher market capitalization growth over 3 years*
High performers compared to their peers…
Data from 2014/2015 State of DevOps Report - https://puppetlabs.com/2015-devops-report
Recent IT Performance Data is Compelling
Faster
HigherQuality
MoreEffective
30x 200xmore frequent deployments
faster lead times
60x 168xthe change success rate
faster mean time to recover (MTTR)
2x 50%more likely to exceed profitability, market share & productivity goals
higher market capitalization growth over 3 years*
High performers compared to their peers…
Data from 2014/2015 State of DevOps Report - https://puppetlabs.com/2015-devops-report
Recent IT Performance Data is Compelling
Faster
HigherQuality
MoreEffective
2555x
Fast
CheapGood
“Pick Two!”
Conventional Wisdom
Fast
CheapGood
“Pick Two!”
Conventional Wisdom
Faster, Better, and Cheaper?
Organizational culture was one of the strongest predictors of both IT performance and the overall performance of the
organization
Devops is about Humans
20
Devops is a set of practices and patterns that turn human
capital into high performance organizational capital.
Devops is about Humans
20
Devops is a set of practices and patterns that turn human
capital into high performance organizational capital.
• Over 15,000 engineers in over 40 offices • 4,000+ projects under active development • 5500+ code submissions per day (20+ p/m) • Over 75M test cases run daily • 50% of code changes monthly • Single source tree
• Over 15,000 engineers in over 40 offices • 4,000+ projects under active development • 5500+ code submissions per day (20+ p/m) • Over 75M test cases run daily • 50% of code changes monthly • Single source tree
• Over 75M test cases run daily
Amazon
• 11.6 second mean time between deploys. • 1079 max deploys in a single hour. • 10,000 mean number of hosts
simultaneously receiving a deploy. • 30,000 max number of hosts simultaneously
receiving a deploy
24
Unicorns and Horses (Enterprises)
Unicorns
Enterprise
Shamelessly stolen and repurposed from: Pete Cheslock
Enterprise Organizations
• Ticketmaster - 98% reduction in MTTR • Nordstrom - 20% shorter Lead Time • Target - Full Stack Deploy 3 months to minutes • USAA - Release from 28 days to 7 days • ING - 500 applications teams doing devops • CSG - From 200 incidents per release to 18
Faster, Better, and Cheaper. How?
Lean Safety Culture Learning Organization
Lean
Service now
Parts Unlimited - "Major Release 6"
Early 2014
Project Initiation
ZRA (finance)
Approve Project
Monthly Steering Meeting
Portfolio
C-level
Steering Comittee
Provides Input
Project Charter
High-Level• Stories• Project Info• Description• Budget• Schedule
PMStakeholders (Tech and Biz)
Create Work Breakdown
Work Breakdown (MS Proj)
High-Level• Milestones• Resource
Planning
3 months 3 monthsHold / Pause
Create Requirements
(Project Meeting)
MS Office
• Detailed Req for new features
• Technology refreshes
• ERD (Infra req)• DRD (Dev req)• BRD (Biz req)
Share Point
Create Design
Tech ReqTech
ReqTech Req
Tech Leads Architects Vendor Arch
Ops Arch
High-LevelServer Tickets
3 months
Receive Request for
Servers
Create Server
Request Spreadsheet
ServerReq
PMTixattach
Route for Approval
Tix
1 week 1 week
• Budget• Appropriate
Resources DB
App or Web
orApproved Into Ops
Delivery Queue
Delivery Manager
"Matt"
Service now
"Heads up"
Assign to Delivery Engineer
Delivery Engineer
Clarify or Confirm Req with Dev or
QA
1 - 6 weeks
Provision Server
and Rework
DBA Validation
App/Web Validation
RestoreData
1 weekApp
Team
App Team
PMStakeholders (Tech and Biz)
Dev Leads
4 weeks
ARB Queue
Detailed Analysis and Requirements
Jira "Stories"
Maybe
Track Ticket Dependencies
Confluence Pages
Team Leads and PMs
Assign Requirements
add more detail for their teams
Architecture Review Board
"Bill" plus Architects
Working Group
Ops? (sometimes)
Devs, PM, Engr, QA
Development Sprint
2 week c/t
Existing Dev Environments
Acquire / Prepare needed
dataOps DBA
Service Data Setup
(Mainframe)
"Jennifer"
Test Data Configuration
Manager
Development Deploy to Integration
Dev, QA
Integration & Regression
Testingfocused on service
ScrumDev/QA
Integ03
ScrumDev/QA
Test Link
Sprint Review
Release to Prod
Product Owners(Using own
criteria)
Create CAB ticket
or
Scrum Team Ops Team(if legacy)
Push Deployment to Stage
Stage
Email Notification
Jira
NewArch
Build VMs
Jira
Ops
ServiceNow
Legacy
QA LeadPMsQAs
End to end testing in Prod
Prod Env
PrdDB
Go-No Go decision meeting
Team Leads
Jira
Ops
By Cluster
"Remove Feature Flag"
(if new arch)
16 weeks
6 weeks H/C: 6 3 weeks H/C: 8
4 weeks H/C:8 3 weeks H/C: 14
Data Setup Integration Testing
DEv Arch
Create Change Tickets > 100
Service Now
ComputeNet
FacilityCablingStorage
"Linda"Ops PM
RESET DELIVERY
DATE!
Steering Comittee
Fix Tickets!
"Linda" Ops PM
Dev Leadership
Assign Dev Team
Ops Intake Meeting
Dev Leadership
1 week
GroupCIOs and
Arch Leads
QA
SteeringDesign
Dev BreakdownDev / Test
Staging Release
Server Requirements GatheringServer Approval and Assignment
Provisioning
Production Release
Initiation and Planning
Create OpsTickets
TS PD
TS PD
Gaps in Requirements• Licenses• Dependencies on 3rd party apps• Capacity planning always seem low
("robbing Peter to pay Paul")• Don't purchase in advance even though
we know it's coming
Duplicate info across different documents
EP
D
D
Procurement of physical servers can take months (lead times for procurement plus facilities groups)
Too many Env. in on ticket cases audit confusionPiecemeal requests ("2 this week, 3 next week")
1 queue for delivery team with ~1,000 tickets at once
Capacity issues cause delay
Often told to stop everything and do something else
TS
D
M
TS
M W
W TS EP
HNo monitoring or backup for some environments
30% of delivery teams time spent "consulting" on performance and dealing with unfounded requests for more capacity
3-5 days to fix~10% S/R
H
D M
TS
H
Often skips CAB. What CAB reviews is often not what built
All manual setup. 1 person really knows how. Low data quality.
Manual process with lots of back and forth.
Many tickets with mismatched priorities
Mostly manual testing
Manual, per clusterFrequently down.
External service updates take offline. Lots of contention.
EPM
D
PDM W
TS
TS D
M TS
PDM
M
S/R - 90%
S/R - 55%
S/R - 15%
D
S/R - 20%
S/R - 50%
Sometimes submits server requests directly to delivery Ad-hoc requests get
lost, maybe 2-3 week delays
TS
High Level
S/R - 75%
9+ months of planning before implementation starts
(and information / requirements still incorrect or incomplete!)
Dev and QA told to submit sever request 6-8 weeks in advance (only done 50% of time)
W5. New "white glove" engagement model
3. Standard product catalog("Environments on Demand")
2. Visualization of flow of work and expected upcoming work
4. Shorten from Design to Implementation
1. Fully Automated Environment Provisioning
7. Small Batches
8. Write end-to-end customer
func. tests
11. Resolve interface to
legacy
10. Test data setup
automation
13. Dev Deploy to Prod for legacy
14. Unify change
management tools
15. Tool
9. Service Verification test writing: shift left to Dev(test early)
12. Remove Bottleneck and Environment Contention(test more)
• Make the work visibile for all • Manage flow and eliminate waste • Build alignment and consensus across team boundaries • Empower teams to find and fix what is getting in the way
• Small Batch • Reduce Work in Process (WIP) • 1x1 Flow • Reduce Bottlenecks (TOC) • Optimize Globally
Where does lean come from?
Where does lean come from?
Where does lean come from?
Let’s talk Kata
I fear not the man who has practiced 10,000 kicks
once, but I fear the man who has practiced one
kick 10,000 times
- Bruce Lee
Toyota is not a story about techniques. It’s an organization defined primarily by the unique behavior routines it continually
teaches to all it’s members.
Mike Rother (Page 262-263)
I have no idea how to answer
that question. It would literally
never occur to me not to do it!
KATA
We are what we repeatedly do. Excellence, then, is not
an act, but a habit.
Aristotle
Safety Culture
Views on Human Error
▪ Views on Human Error
▪ The old view of human error (First Story)
▪ Human error is the cause of accidents ▪ To explain failure,you must seek failure ▪ You must find people’s: inaccurate assessments,wrong decisions, bad judgments
▪ Views on Human Error
▪ The new view of human error (Second Story)
▪ Human error is a symptom of trouble deeper inside a system ▪ To explain failure, do not try to find where people went wrong ▪ Instead, find how people’s assessments and actions made sense at the time, given the circumstances that surrounded them
▪ Bad Apple Theory - Throw away the bad apples
▪ Complex systems are basically safe, they need to be protected from unreliable people (bad apples) ▪ Human errors cause accidents: humans are the dominant contributor to more than two thirds of mishaps ▪ Errors occur because of human loss of situation awareness, complacency, negligence ▪ Errors are introduced to the system only through the inherent unreliability of people.
What can go wrong usually goes right, but then we draw the wrong conclusion.
Murphy’s Law is Wrong! Sidney Dekker The Field Guide to Human Error
Your organization must continually affirm that individuals are NEVER the ‘root cause’ of outages.
▪ Hindsight bias: ▪ knew-it-all-along, to see the event as having been predictable, counterfactuals
▪ Outcome bias: ▪ evaluating the quality of a decision when the outcome of that decision is already known
▪ Availability bias: ▪ preference by decision makers to information and events that are more recent
▪ Fundamental attribution error: ▪ explain behavior in terms of internal disposition, such as personality traits, abilities, motives, etc. as opposed to external situational factors
▪ Just Culture at Etsy (John Allspaw)
▪ Encourage learning by having these blameless Post-Mortems on outages and accidents
▪ Understand how an accidents happen, in order to better equip ourselves from it happening in the future
▪ Gather details from multiple perspectives on failures, and we don’t punish people for making mistakes
▪ Enable and encourage people who do make mistakes to be the experts on educating the rest of the organization how not to make them in the future
▪ Just Culture at Etsy (John Allspaw)
▪ Accept that there is always a discretionary space where humans can decide to make actions or not, and that the judgement of those decisions lie in hindsight
▪ Accept that the Hindsight Bias will continue to cloud our assessment of past events, and work hard to eliminate it
▪ Accept that the Fundamental Attribution Error is also difficult to escape, so we focus on the environment and circumstances people are working in when investigating accidents
"In dynamic fault management, intervention precedes or is interwoven with diagnosis"
- Woods (1994)
Source: (Woods) John Allspaw - http://bit.ly/AllspawThesis
Learning Organization
That’s how it’s always been done
around here!
You are either building a learning organization… or you will be losing to someone who is
- Walter Sobchak
You are either building a learning organization… or you will be losing to someone who is
- Walter Sobchak - Andrew Clay Shafer
▪Dr Deming
A learning organization is a place where people are continually discovering how they create their reality.
- Peter Senge
Ladder of Inference Chris Argyris
• Action • Beliefs • Conclusions • Assumptions • Meanings • Select • Observe
Ladder of Inference
▪ Can create bad judgement ▪ Our assumptions can lead us to bad conclusions ▪ Question your assumptions and conclusions ▪ Seek contrary data ▪ Make your assumptions visible to others ▪ Invite others to test your assumptions and conclusions ▪ Inquire other peoples assumptions and conclusions ▪ Move down the ladder instead of up
Ladder of Inference - Bad Judgement ▪ Observe - Notice people in the first row ▪ Select - Person in front row keep looking at their phone ▪ Meaning - Not listening to my presentation ▪ Assumption - He is not interested ▪ Conclusion - Doesn’t like my new idea ▪ Beliefs - Their team always blocks new ideas ▪ Action - I send a nasty email to their boss
Ladder of Inference - Alternative Assumption ▪ Observe - I notice people in the first row ▪ Select - Person in the front row keep looking at their phone ▪ Meaning - Not listening to my presentation ▪ Assumption - Try and engage with a question (safely) ▪ Conclusion - Might find out that they are late for another meeting and they really don’t want to miss this one… so they sent an email noticing the next meeting team that they will be late…. ▪ Beliefs - They are very excited about this new idea ▪ Action - Both teams setup another meeting to engage.
top related