managing technical debt
DESCRIPTION
Managing the the Technical Debt lifecycle. In this presentation we explore the evolution of the metaphor, the value it brings to organizations and challenges to successful adoption. The full audio and video can be viewed at http://blog.acrowire.com/td-webinar.TRANSCRIPT
1
Webinar: Managing Technical Debt
Audio and video of this presentation are available at the link below
http://blog.acrowire.com/td-webinar
2
Ted TheodoropoulosPresident
Michael MilutisDirector of Marketing
Computer Aid, Inc. (CAI)[email protected]
President of Acrowire Technology Consulting•Application Development•Business Process Improvement•ALM/Tech debt assessments
Programming since 1982•TI-99/4a using BASIC
Microsoft SQL Server Team10 years at Bank of America
•Development Team Manager•IT Auditor•Senior VP in Operational Risk
Undergrad in Mathematics & MBA from UNCSix Sigma Black Belt/CSM/MCP
Ted Theodoropoulos
3
4
The Project Management Institute has accredited this webinar with PDUs
PDU CREDITS FOR THIS WEBINAR
Managing Technical Debt
ITMPI Webinar ▪ May 15, 2012
1. Introduction1. Introduction
6
1.Introduction2.What is technical debt?3.Opportunities and challenges?4.Business impacts5.Foursquare case study6.Managing the lifecycle
1. IntroductionOutline
7
2. What is technical debt?2. What is technical debt?
8
2. What is technical debt?
Ward CunninghamInvented the wiki in 1994 Coined the term at OOPSLA in 1992
Technical Debt includes those internal things that you choose not to do now, but which will impede future development if left undone. This includes deferred refactoring.
Technical Debt doesn't include deferred functionality, except possibly in edge cases where delivered functionality is "good enough" for the customer, but doesn't satisfy some standard (e.g., a UI element that isn't fully compliant with some UI standard).
Evolution
9
Jeff SutherlandCofounder of ScrumOpined at Scrum Gathering in 2006
Described the following technical debt scenarios:
1. The code is considered part of a core legacy system, in which its functionality is connected to so many other parts of the system that it’s impossible to isolate any one component.
2. There is either no testing or minimal testing surrounding the code. Although it may sound redundant, it is necessary to point out that without comprehensive unit tests, it is impossible to refactor the code to a more manageable state.
3. There is highly compartmentalized knowledge regarding the core/legacy system, supported by only one or two people in the company.
2. What is technical debt?Evolution
10
Steve McConnellAuthor/Software EngineerProposed First Taxonomy in 2007
I. Debt incurred unintentionally due to low quality work
II. Debt incurred intentionally
II.A. Short-term debt, usually incurred reactively, for tactical reasons
II.A.1. Individually identifiable shortcuts (like a car loan)
II.A.2. Numerous tiny shortcuts (like credit card debt)
II.B. Long-term debt, usually incurred proactively, for strategic reasons
2. What is technical debt?Evolution
11
Martin FowlerAuthor/Software EngineerEstablished TD Quadrants in 2009
2. What is technical debt?Evolution
12
Gartner, IncIT Research and AdvisoryEstimated $500 Billion of “IT Debt” in 2010
Gartner Estimates Global 'IT Debt' to Be $500 Billion This Year, with Potential to Grow to $1 Trillion by 2015
"The issue is not just that maintenance keeps on getting deferred, it is that the lack of an application inventory and the absence of a structured review process for the application portfolio. This means the IT management team is simply never aware of the true scale of the problem”
2. What is technical debt?Evolution
13
Ted TheodoropoulosTechnical Debt PractitionerProposed “Stakeholder Perspective” at SEI in 2011
“Technical debt is any gap within the technology infrastructure, or its implementation, which has a material impact on the required level of quality.”
2. What is technical debt?Current
14
2. What is technical debt?Stakeholder Perspective
BusinessExecutives
RiskManagers
Board of Directors
DevelopmentTeam
InfrastructureTeam
InternalAuditors
Customers Shareholders RegulatorsExternalAuditors
Analysts
Technical Environment
Inte
rnal
Exte
rnal
Stakeholders need better transparency and engagement around issues affecting quality in the technical environment.
15
2. What is technical debt?Quality Requirements
• Gaps impacting required levels of quality represent technical debt
• Teams can “borrow” against the ideal solution to speed initial delivery
• Interest is then paid in the form of lower productivity and/or incremental risk
• Maintenance and enhancement activities become more onerous and expensive
• Interest compounds as workarounds are applied on top of workarounds
16
2. What is technical debt?Deficits and Surpluses
• Yellow section shows gap in maintainability on which interest is paid
• Conversely blue represents unneeded functionality that must be maintained
• Deficits and surpluses in application quality cost the organization money
• Ideally green area would fill area within dashed line
17
3. Opportunities and 3. Opportunities and ChallengesChallenges
18
3. Opportunities and ChallengesOpportunities
void clearIECache() { ClearFolder (new DirectoryInfo (
Environment.GetFolderPath (Environment.SpecialFolder.InternetCache)
)); }
$658,031.35
Stakeholder
19
TranslationTranslation
• Business leaders always want to build new stuff• Quantifying gaps in dollars levels the playing field• Getting the business to recognize the value of refactoring is difficult• New initiatives can be prioritized based ROI against debt reduction
3. Opportunities and ChallengesOpportunities
20
PrioritizationPrioritization
Know what is beneath the surface!
3. Opportunities and ChallengesOpportunities
21
TransparencyTransparency
Know what is beneath the surface!
3. Opportunities and ChallengesOpportunities
22
Risk ManagementRisk Management
No Conceptual ModelNo Conceptual Model
23
3. Opportunities and ChallengesChallenges
Quality Debt
Refactoring Debt
Pairing Debt
Design Debt
Configuration Management Debt
Platform Experience Debt
SEO Debt
Documentation Debt
Access Control Debt
Data Quality Debt
Legacy Debt
“Cruft is technical debt!”
-Ted Theodoropoulos
“Cruft isn’t technical debt!”-Uncle Bob Martin
Testing Debt
Vendor Support Debt
Concept FragmentationConcept Fragmentation
24
3. Opportunities and ChallengesChallenges
Unknown Future StateUnknown Future State
25
3. Opportunities and ChallengesChallenges
• No standards organization currently manages to concept• Uncertainty around what technical debt is headed• Adoption will be hampered by this uncertainty• SEI is leading efforts to move the concept forward
4. Business Impacts4. Business Impacts
26
27
4. Business ImpactsPlatform Stability
• Technical debt is often fragile or difficult to maintain code• Has a destabilizing effect on production systems• This type of technical debt decreases agility and increases
defects• Increases risk of production issues with customer impact• Decreases ability to seize market opportunities• Increases fire drills which impacts morale• Lower employee satisfaction makes talent retention challenging
28
4. Business ImpactsCost of Change
• Technical debt typically compounds over time • This phenomena increases CoC exponentially• Customer responsiveness is inversely proportional to CoC
29
4. Business ImpactsTechnical Bankruptcy
• Unabated technical debt leads to ballooning interest payments
• Over time the interest payments become all consuming• First there are no resources available for enhancements• Then interest payments exceed the available resources• This is known as technical bankruptcy
5. Foursquare Case Study5. Foursquare Case Study
30
5. Foursquare Case StudyBackground
31
• In Spring 2011, Amazon had a major outage in AWS• Multiple availability zones (AZs) were impacted• While the outage was disappointing it did not violate the SLA• As Gartner points out below there were no SLAs for impacted services
Amazon’s SLA for EC2 is 99.95% for multi-AZ deployments. That means that you should expect that you can have about 4.5 hours of total region downtime each year without Amazon violating their SLA. Note, by the way, that this outage does not actually violate their SLA. Their SLA defines unavailability as a lack of external connectivity to EC2 instances, coupled with the inability to provision working instances. In this case, EC2 was just fine by that definition. It was EBS and RDS which weren’t, and neither of those services have SLAs.
5. Foursquare Case StudyArchitecture
32
• Amazon is an infrastructure as a service (IaaS) provider• IaaS consumers can design applications as they see fit• Individual requirements dictate architecture• If an app requires HA then it must be accommodated in the design• Failing to satisfy requirements introduces risk into the environment• Foursquare replicated across AZs instead of across data centers• Best practices for HA were not followed
5. Foursquare Case StudyTechnical Debt
33
• Implementing full redundancy is not cheap• Startup capital is a scarce resource and must be used wisely• Replicating across AZs was cheaper than across data centers• This architecture created a requirements gap which represents debt• The principal of the technical debt is the cost to provide full HA• The interest takes the form of the incremental risk
5. Foursquare Case StudyDebt Calculation
34
Incremental Risk: 4%-0.5% = 3.5%Cost of Failure: $1,000,000
Interest: $35,000
• Based on optimal design risk of an event is 0.5%• Design shortcuts increased risk to 4%• Incremental risk associated with design is 3.5%• If outage occurs, damage to brand and investor confidence• Additionally, there will be lost users and market share• The estimated cost of such an event is $1M
5. Foursquare Case StudyPrudent Debt
35
Principal: $100,000Interest: $35,000
Return on Investment: 35%
• Technical debt can be leveraged responsibly just like financial debt• Assume the appropriate design cost add’l $100K to implement• That relatively large investment would eliminate $35K in risk• Such an investment would provide a 35% ROI• Each dollar invested would give $0.35 back to the business• Currently, paying off debt might be a questionable use of capital
5. Foursquare Case StudyImprudent Debt
36
Principal: $5,000Interest: $35,000
Return on Investment: 700%
• Sometimes the risk/reward equation is out of balance• Assume the appropriate design cost add’l $5K to implement• That relatively small investment would eliminate same $35K in risk• Such an investment would provide a 700% ROI• Each dollar invested would give $7 back to the business• Currently, paying off debt would be a wise use of capital
Amazon EC2 Outage Hobbles Websites-Information Week
Amazon Server Troubles Take down Reddit, Foursquare & Hootsuite
-Mashable
“Amazon Server Outage Blanks Popular Websites”
-Fox News
“Massive failure at Amazon Web Services causes havoc…”-GeekWire
37
5. Foursquare Case StudyInitial Focus
“Amazon’s Web Services outage: End of cloud innocence?”-ZDNet
Amazon Malfunction Raises Doubts About Cloud Computing-NY Times
Failing to plan is planning to fail
-Acrowire“The AWS story shows how important it is to think about engineering when you're designing systems for the cloud.“ -DataPipe“Lessons from a cloud failure: It’s
not Amazon, it’s YOU!-Webmonkey
38
5. Foursquare Case StudyRetrospective Focus
"In short, if your systems failed in the Amazon cloud this week, it wasn't Amazon's fault,“
-O’Reilly Media
We designed for failure from day one. Any of our instances, or any group of instances in an AZ, can be “shot in the head” and our system will recover.
-SmugMug
Why were some websites impacted while others were not? For Netflix, the short answer is that our systems are designed explicitly for these sorts of failures.
-Netflix
6. Manage the Lifecycle6. Manage the Lifecycle
39
6. Manage the LifecycleLifecycle Phases
40
TechnicalDebt
6. Manage the LifecycleDefine
41
• Define what qualifies as technical debt in your organization• Think through the implications of the defined boundaries• Process must be collaborative and not done in a vacuum• Will key stakeholders (i.e. audit, risk mgmt, IT) buy into it?
BusinessExecutives
RiskManagers
Board of Directors
DevelopmentTeam
InfrastructureTeam
InternalAuditors
Customers Shareholders RegulatorsExternalAuditors
Analysts
Technical Environment
Inte
rnal
Exte
rnal
6. Manage the LifecycleDefine
42
Framework AlignmentFramework Alignment
6. Manage the LifecycleIdentify
43
Signs you might have it…Signs you might have it…• Don’t we have documentation on the file layouts?• I thought we had a test for that!• If I change X it is going to break Y….I think.• Don’t touch that code. The last time we did it took weeks to fix.• The server is down. Where are the backups?• Where is the email about that bug?• We can’t upgrade. No one understands the code.
6. Manage the LifecycleIdentify
44
Establish KRIsEstablish KRIs
Leading Indicators•Poor code quality•Inadequate code coverage•Non-compliance
Lagging Indicators•SLA violations•Audit failures•Data quality issues
6. Manage the LifecycleMeasure
45
Calculating PrincipalCalculating Principal
n = Number of resources requiredR = Rate (hourly average) of resourceH = Hours requiredC = Costs associated with benefits, payroll, recruitment (usually ~40% of hourly rate)HC = Hardware CostsSL = Software LicensesMI = Migration and Implementation expenses (e.g. consulting engagements, training, etc)
6. Manage the LifecycleRemediate
46
ROI
PrioritizationPrioritization
• Refactoring initiatives can be evaluated• Quantifying gaps in dollars levels the playing field• Getting the business to recognize the value of refactoring is difficult• New initiatives can be prioritized based ROI against debt reduction
6. Manage the LifecycleGovern
47
Capital StructureCapital Structure• Evaluate free cash flow volatility over time• Determine appropriate technical debt to equity ratio• Monitor your technical balance sheet diligently• Establish centralized debt registration database• Implement credit limits for high risk areas of the infrastructure• Foster a risk management culture within the organization
48
Questions?
49
Ted TheodoropoulosPresident
Michael MilutisDirector of Marketing
Computer Aid, Inc. (CAI)[email protected]