Download - Visible Ops
IS 2540
IT Governance:
A Practical Transition Strategy
Based on:
The Visible Ops Handbook, Behr, Kim & Spafford, 2005
All figures from Visible Ops
Visible Ops
• Based on extensive observation / data collection from hundreds of IT organizations
• Goal was to discover what IT practices distinguish high-performing organizations
• Employs a benchmarking approach
Visible Ops
• Main IT success factors– Pervasive change management practices– Understanding of cause-effect relationships– Use of effective & auditable controls
• IT management based on facts not intuition or gut feel
• 80% of outages due to operator or application errors
Visible Ops
• IT organizational culture problems– Bureaucratic CM – ‘End runs’ around CM– Delusional agility of cowboy culture – Control isn’t possible, so page the IT firefighters– Firefighting in lieu of fire prevention– Auditors see chaos, so they push for more controls– IT doesn’t know which controls to implement
• Does COBIT have the right controls?• In which order should they be implemented?
Visible Ops
• Symptoms of effective IT– High service levels / availability
• High MTBF and low MTTR
– Lots of changes, successfully implemented• 100s-1000s / week• > 99% successful
– Invest in early phases of IT processes• Lowers cost of defect repair• Sound familiar?
Visible Ops
– Process integration between organizational units
• Leads to collaborative working relationships
– Compliance-oriented• Relevant controls in place & working• Controls documented & easily verified
– Low % of unplanned work• < 5% spent on unplanned / urgent work• Frees up resources for fire prevention• Sound familiar?
Visible Ops
– Huge leverage WRT IT assets & human resources
• Server:SysAdmin > 100:1 (5X the average)• Process effectiveness leads to higher productivity• Sound familiar?
Visible Ops
• High performing common cultures– Change management
• Viewed as absolutely critical• Not viewed as bureaucratic• All changes must be successful
– Causality• 80% of outages due to changes• 80% MTTR finding change that caused outage• By analyzing changes, first fix 90% effective
Visible Ops
– Continual optimization• Discover root causes of IT problems• Prevent problems before they happen• Highest level of compliance with least effort to
maintain compliance• Sound familiar? (Quality is free!)
Visible Ops
• Common IT processes– None succeeded due to COBIT or ITIL!– Each rediscovered good practices – Each developed their own terminology– Causes communication problems – Visible Ops standardizes process terminology
Visible Ops
• Visible Ops standardizes processes wrt ITIL process framework
– Release – Control – Resolution – Relationship – Service delivery
Visible Ops
• Successful companies used 3 out of 5– Release
• Invest your efforts in pre-production activities• Plan, build, design, configure before release
– Control• Control to prevent service disruptions• Effective controls allow greater agility, not less
– Resolution• Minimize rework efforts & downtimes• Requires cause-effect knowledge• Frees resources for release & control activities
Visible Ops
• Other success factors– Controls are visible to management, security
& auditors– Effective CM must address human factors – Rebuilding is easier than repairing
Visible Ops
• Visible Ops Key to success– Make transition short, easy & practical
• Multi-year death marches don’t work– Could lose management sponsorship– Staff will circumvent
– Use fewest # of processes possible– Implement 3 of 4 processes within 90 days
Visible Ops
– Process projects are• Definitive w/ clearly defined objective• Ordered to build on previous phase• Catalytic to free up more resources than it uses• Auditable to create ongoing documentation of
controls• Sustaining by creation of value to enterprise
Visible Ops
• 4 Visible ops phases– Stabilize The Patient– Catch & Release and Find Fragile Artifacts– Establish Repeatable Build Library– Enable Continuous Improvement
Visible Ops
• Stabilize The Patient– Medical triage for IT– Goal
• Reduce unplanned work to < 25%• Frees resources for more productive work
Visible Ops
– Symptoms• Unplanned work 35-45% on average, can exceed
65% ..Sound familiar?• IT creates most of their own problems• Most of downtime spent diagnosing cause of
problem, only 20% spent in actual repair• Don’t know who made change or why• Changes undo other changes• Lack of confidence in IT
Visible Ops
• For each fragile IT asset– Reduce / eliminate access– No changes unless explicitly authorized– Communicate change lockdown to
stakeholders– Allow change only during specified time
window– Enforce / reinforce CM process
Visible Ops
– Effective CM plays critical role in stabilizing IT– Responsibility & accountability for everyone– Use automated detection tools like Tripwire to
ID unauthorized changes• For each unauthorized change
– Who did it?– What was changed?– Can it be rolled back?– How to prevent reoccurrence?
Visible Ops
• CM key to success is– Create culture of accountability– Enforce maintenance windows– Manage by facts, not beliefs– # of acceptable unauthorized changes = 0
Visible Ops
– Eliminating changes decreases outages reducing amounts of unplanned work
– Frees up resources for productive work– Create a Change Advisory Board (CAB) to
manage changes• Accept that business events cause IT change
events• All major IT groups on CAB + Senior Mgmnt. • Create emergency change procedure & use it
sparingly
Visible Ops
– Implement change request tracking system• Document & track all changes throughout their
lifecycle• Automated tools are available• Collect change control metrics & generate reports
– CAB weekly meetings to authorize changes• Goal is maximum effectiveness with minimum
bureaucracy• Use meeting agenda template (p. 33 - 34)
Visible Ops
– For each change request do complete analysis of impacts
• Who, What, When, How, & What IF questions • Rank requests by priority• Identify change dependencies• Major risks involved • Rollback strategy
Visible Ops
– Effective CM• Post-implementation reviews• Measure success rate & learn from it• Everyone attends meetings• Document all change outcomes
– Ineffective CM• Authorize changes without rollback plan• Rubber stamping• Outright waivers
Visible Ops
– Primary reason for any process failure is• Lack of accountability• Lack of strong management support
– General perception of nimbleness & agility is a delusion
Visible Ops
• Stabilize The Patient Benefits– Higher availability– Less firefighting– Higher change rate success– CM process that’s efficient & effective– Increased MTBF due to change windows– Decreased MTTR due to CM– Increased individual accountability– Improved organizational communication
Visible Ops
• Phase 2: Catch & Release / Find Fragile Artifacts– Create & maintain inventory of IT assets (esp.
production assets)
– Symptoms• How to start building a CMDB?• Knowledge is individual, not organizational• Uncontrolled changes cause unknown configuration states• Explosion in # of configurations
Visible Ops
– Tasks• Senior staff to inventory all managed assets• Thoroughly document all assets (P. 42 has
checklist of questions)• Tag the fragile assets
– ID those requiring most unplanned work– ‘Do Not Touch’ – Focus efforts on unstable assets
• Prevent new builds until inventory completed– Exceptions only via CAB
Visible Ops
– Benefits• Service catalog documenting most critical services
being supported • CMDB containing all CI
– Supports queries / ad hoc reporting based on metrics
• Prioritized list of projects to replace fragile assets• More organizational learning
Visible Ops
• Phase 3: Repeatable Build Library– Create library of repeatable builds focusing
first on fragile configurations– A datacenter of Golden Images– Enables replace instead of repair
Visible Ops
– Symptoms• Configurations are unique, irreplaceable works of
art• Production configurations evolve rendering release
configuration obsolete • More configurations require more specialized
knowledge about each configuration• Patches cause crashes• Patches not incorporated into builds•
Visible Ops
– Create release management team• Operate earlier in cycle to reduce defects in
production• Engineer repeatable builds
– Require constant time to rebuild– Reduces configuration variance– Junior staff does the builds– Frees senior staff for more proactive tasks
• Goal is to reduce # of configurations while increasing their shelf life
Visible Ops
– Create repeatable build process• Generates Golden Builds (master images)
– Thoroughly planned, tested and approved– Kept current with new patches & upgrades– Stored in definitive software library (DSL)
» Along with associated assets (documentation, licenses, keys, etc.)
– DSL is SW Fort Knox
Visible Ops
– Creating a DSL• ID lowest common IT asset denominators
– Operating systems, applications, business rules & data
• Create build catalog of components that must be standardized
• Create a repeatable build process for each item in catalog
• Isolate build network from other networks• Place master builds in DSL• Keep master builds current
Visible Ops
• Designate a DSL manager• Create a DSL approval process for submitting
master builds• Keep all copies under revision control• Initial 1 year amnesty for all running applications
– Replace with certified master builds as they become available
• Weed out unnecessary master builds
Visible Ops
– Establish acceptance process between production and release teams
• Release team designs and builds configurations• Production teams accepts and deploys• Production must get CAB approval prior to deployment
– Can’t put any configuration into production unless accepted by production team
• Production only tests configurations in DSL
– For security reasons, developers not part of build process
• Could insert malicious code
Visible Ops
– Patching• Belongs in release management• Patch and Pray to be avoided• Successful IT organizations patch less often• Apply / test patches before releasing to production• Patch to production system may be overwritten by
subsequent build• Use detective control tools to ensure build integrity
Visible Ops
– Benefits• Build library cuts unplanned work to < 15%• Release management team with well defined roles• Process for repeatable builds• Can repair by rebuilding• Free up senior staff resources• Tighter integration between release & production• Reduced # of configurations• Reduced patch risks
Visible Ops
• Phase 4: Continual Improvement– Goal is to collect & use metrics to improve
performance
– Simply adopting best practices = competitive parity…not good enough
Visible Ops
– Can’t manage what you can’t measure• Sound familiar?
– Key IT metrics are availability• MTBF & MTTR• Affected by factors in release & controls
Visible Ops
– Release• Are we efficient at provisioning infrastructure?
– Controls• Are we making good change management
decisions?
– Resolution• Are we quickly diagnosing and fixing problems?
– IT needs metrics for all 3 process areas
Visible Ops
– Release metrics• Time to provision good build• # of build revisions before accepted• Build shelf life• % systems that are good builds• % builds with security sign-off• # builds rushed into production• Release Engineers : SysAdmin ratio
– Higher is better
Visible Ops
– Controls metrics• # authorized changes / week• # actual change / week
– Should equal # authorized
• # unauthorized changes– Should be zero
• Change success rate– Should be > 99%
• # outages• # emergency changes per CAB
Visible Ops
• # ‘special’ changes outside CAB• # ‘business as usual’ changes• CM overhead in man-hours• Changes submitted vs. Reviewed
– Resolution metrics• MTTR• MTBF