itil best practice for software companies
DESCRIPTION
Detailed outline of an Information Technology Infrastructure Library (ITIL) is a set of practices for IT service management (ITSM) that focuses on aligning IT services for Software CompaniesTRANSCRIPT
Introduction to Service Management
ITILS Services for Software Companies
Daniel Brody© 2014
Service Components
Process Orientated Working
Incident Management
Problem Management
Service Level Management
Change Management
What is ITIL?
IT Infrastructure Library
IT Infrastructure Library
• A set of books describing
code of best practice for IT
Service provision
• UK Government
• First edition – Late 80’s
• Revised in 2000
• Non-proprietary • Platform independent
ITIL
Two Books
Service Delivery
Service Support
ITIL’s Service Management
Service Support
Focuses on the day to day operation and support of IT Services
Service Delivery
Focuses on long term planning and improvement of IT Service provision
ITIL Publications
Planning to Implement Service Management
The Business Perspective
Application Management
ICT Infrastructure Management
Security Management
Service Support
Service Delivery
Planning to Implement Service Management
Service Management
ServiceSupport
ServiceDelivery
The
Business
The Business
Perspective
Applications Management
ICTInfrastructureManagement
The
Technology
Security Management
IT Service Management….
FoundationSupport
Configuration Management
Mission
To identify, control and audit the information required to manage IT services by defining and maintaining a database of controlled
items, their status, lifecycles and relationships and any information needed to
manage the quality of IT services cost effectively
Asset Management
vs
Configuration Management
Objectives
• Identify & record management information
• Account for all IT assets & configurations
• Control the information in the database
• Ensure that information reflect reality
• Provide a basis for management
• Provide status of components
Key Terms
Configuration
• Hardware
• Software
• Documentation
• Communication
• Environmental Equipment
• Staff
Anything that needs to be controlled
Configuration Item (CI)
PC
KeyboardMonitor System Unit
CD ROM StiffyDrive
MemoryHardDrive
CPU
• A component within a configuration
• A configuration in its own right
How low do you go???
• CI Levels
– Lowest level of independent change
– Who are you and what are you doing?
– Information value vs collection effort
Attributes
Memory
PC
KeyboardMonitor System Unit
CD ROM StiffyDrive
MemoryHardDrive
CPU
Key Terms
• Relationship– Primary– Secondary
• Baseline– Snapshot of a CI at a time or stage
• Variant– A baseline with minor differences
• Model, Version and Copy numbers– Type– Unique– Version / Copy
Life Cycles
• Stages in the life of a CI
• Allow CIs to be moved and trackedOrdered
Delivered
Set up
Installed
Withdrawn
Maintenance
Key Activities
Stages in Configuration Management
• Identification
• Control
• Status Accounting
• Verification / Audit
Identification
• Logical– What items need to be recorded?– What do we need to know about them?
• Physical– Marking items that are under Configuration
Management control
Logical & Physical
Basic Principles
• CIs must be uniquely identified
• Prominent & clearly visible
• Meaningful naming
• Copy numbers must be catered for
• Cater for growth
Control
• Information in the CMDB– Access– Changes– Adding new items
• To achieve control– Agree and freeze CI specification– Only allow changes through change management
Status Accounting
• Uses lifecycles and attributes
• Records and reports on– Current data– Historical data
Verification & Audit
• Does the CMDB reflect reality?
• Accuracy is improved by– Active rather than passive CMDB– Automatic updating– Integration with other processes– Automatic checks
The CMDB
Accounts Sales Marketing Manufacturing
Inventory Purchasing Distribution HR
Information Management
Accounts Sales Marketing Manufacturing
Inventory Purchasing HRDistribution
ManagementInformation
Accounts Sales
HR
Distribution
Marketing
Manufacturing
InventoryPurchasing
ManagementInformation
CorporateDatabase
NETWORK SLMCHANGEPROBLEM
CAPACITY FINANCIAL SERVICE DESK PERFORMANCE
CMDB
PROBLEM
CAPACITY
CHANGEFINANCIAL
SERVICE DESK
PERFORMANCESLM
ManagementInformation
CMDB
Underlying Databases
VIRTUALCMDB
Benefits, costs & problems
Benefits
• Accurate information & documentation on CI’s
• Control of valuable CI’s
• Legal Obligations
• Financial & expenditure planning
• Registration of Software Changes
• Contingency planning
• Improving Release Management
• Improved security
• Trending data
Costs
• Staff costs– Initial audit– Management
• HW & SW identification & Level of control• Number of users who have access• Need for tailoring • Diversity & quality of information• Level of integration
Possible Problems
• Incorrect CI level• Emergency changes • Over-ambitious schedules• Circumvention of procedures• Manual systems • Over expectation• Isolated implementation• Difficult without Change Management• Difficult to cost justify• No operational use of the system
Change Management
Mission
To manage all changes that could impact on IT’s ability to deliver services through a formal,
centralised process of approval, scheduling and control to ensure that the IT Infrastructure
stays aligned to business requirements with minimum risk
Objectives
• Manage the process of:– Requesting changes– Assessing changes– Authorising changes– Implementing changes
• Prevent unauthorised changes• Minimise disruption• Ensure proper research and relevant input• Coordinate build, test and implementation
Scope
· Hardware
· System Software
· Communications Equipment and Software
· ‘Live’ Application Software
· All documentation, plans and procedures relevant to the running, support and maintenance of live systems
· Environmental Equipment
Key Terms
Key Terms and Roles
• Request for Change (RFC)– Contains all necessary information to make the change
• Change Advisory Board (CAB)– Assesses resource requirements and impact– Advises the Change Manager
• CAB Emergency Committee (CAB/EC)– Urgent changes– 1-3 senior staff
• Forward Schedule of Change (FSC)– Details of approved changes & dates
• Projected Service Availability (PSA)– Best time for change to be implemented
• Change Model– Pre-defined path
• Standard Change– Pre-authorised change
Change Management Procedure
Initiate Change
Filter Requests
Initial Priority
Decide Category
Urgent?
Normal ChangeProcedure
Reject
To urgent procedureYes
Change Model?To Change Model
procedure
Yes
Minor Significant Major
Assess impact and resources. Confirm priority and ScheduleAuthorised?
No
Yes
Refers RFC upwards. IT Director decides then passes to CAB for
actioning
Circulates RFCs to CAB members
Authorises and schedules change. Report action to
CAB
Independent Testing
Build change, Testing & back out Plans
Co-ordinates implementation
Working?
Monitor/Review Change
Back out / Refer back to CAB
Normal Change Implementation
Procedure
From Normal Change
Yes
No
Successful?
Yes
Close
NoTo Start
Failure
Update Documentation
Urgent CAB or CAB/EC meeting
Assess impact resource requirements and urgency
Urgently prepares the change
Urgent? To normal procedure
Urgent ChangeProcedure
Time for test? Urgent TestingYes
No
Yes
Failure
No
Co-ordinate implementation
Satisfactory?
Co-ordinates implementation
Implements back-out plans. Change is referred back to
CAB/EC
Ensure records are brought up to date
Review Change
Urgent ChangeProcedure
Satisfactory?No Yes
CloseTo Start
Benefits, Costs & Problems
Benefits to Business
• Greater IT & business alignment
• Higher availability
• Increased productivity
• More communication – greater trust
• IT can handle more changes
• Balance between need for change & potential impact
Financial Benefits
• More accurate forecasting
• Better quality decisions
• Reduction in amount of rework
IT Benefits
• Easier to meet SLA’s
• Fewer change failures
• Back out plans – easier restore
• Valuable input for problem & availability
• Increased productivity of IT Staff
Costs
• Software costs• Integration & modification• Staff • Accommodation
Possible Problems
• Bureaucratic procedures • Resistance to “control culture”• Bypassing of procedures• Integration to Configuration Management• Inaccurate information• Handling urgent changes• Detecting unauthorised changes• Too broad scope for a change• Unclear ownership
Release Management
Mission
To take an holistic view of a change to an IT Service and ensure that all aspects of a
release, both technical and non-technical, are considered together
Why Release Management?
• Large or critical hardware roll-outs
• Major software roll-outs
• Bundling or batching related sets of changes
In-house applications
“Other” software
Utility Software
System Software
Hardware Specifications
Assembly Instructions
User Manuals
Key Concepts
Key Concepts
• Release– Collection of authorised changes– Major / minor / emergency
• Definitive Hardware Store (DHS)– Storage of Hardware spares
• CMDB– Definitions of planned releases– Records of CI’s impacted by release– Information about the target of environment
Key Concepts
• Definitive Software Library (DSL)– Physical secure storage– Source code & Original media
• Build Management– Controlled environment– Compiled on dedicated “build hardware”
• Release Policy– Roles, responsibility & content – Form part of initial planning
• Release Unit– Components released together
Release Units
• Systems, suites, programs and modules
• Factors affecting the level of release– Number and extent of changes
– Number of changes that can be managed
– Available resources and time
– Ease of implementation
– Complexity of the release
Release Units
System 1
Suite 2.1
Program 2.2.1
Module 2.2.2.1 Module 2.2.2.2 Module 2.2.2.3
Program 2.2.2 Program 2.2.3
Suite 2.2 Suite 2.3
System 2 System 3
IT Infrastructure
Development Releases - and
• Managed by development
• Must not affect live services
• Should not require production resources
• Customer agreement obtained
• Usage covered in SLAs
• Must not replace live systems
• Must be licensed
Normal Release - Full
• All components built, tested, distributed & implemented together
• Better integrated testing
• Easier to detect & rectify problems
• Complex & will require more resources
Normal Release -
• Partial release• Contains only new or changed items• Not as stable as full releases• Authorisation of a delta release depends on:
– Size of a full release compared to the delta– Urgency of required facilities– Number of changes already made– Potential business impact– Available resources
Normal Release – Package
• Combination of release units• Reduces number and frequency of releases• Better integration and testing• Less old or incompatible software• Could result in delays to fixes or
enhancements• Greater potential for disruption
C1
C2
C3
C4
C3
Package Release
Delta Release
M1
M1
M2
M3
M4
Full Release
Urgent Releases
• Disruptive and error prone• Often used to bypass Change Management• Controls are essential
– Use software from the DSL– Software must be replaced through the DSL– Must follow Change Management– CMDB must be updated– Version control– Testing and documentation– Give notice
Back-Out Plan
• Documents actions that will restore service • Still part of change• Two approaches
– A full reversal of release– Contingency plans to restore as much as possible
• Should be verified and tested
Key Activities
Configuration Management Database (CMDB)&
Software Library
Release P
olicy
Release P
lann
ing
Desig
n &
develo
p, o
r o
rder &
pu
rchase
softw
are
Bu
ild &
con
figu
re the
Release
Fit-fo
r-pu
rpo
se testin
g
Release A
cceptan
ce
Ro
ll-ou
t plan
nin
g
Co
mm
un
ication
P
reparatio
n &
train
ing
Distrib
utio
n &
in
stallation
Development Environment Controlled Test EnvironmentLive
Environment
Release Management
Release Policy
• Basis of subsequent activities• Management roles & responsibilities
Release Planning
• Agreeing release content• Planning phases of releases• Produce schedule• Assess hardware at target site• Plan resource requirements• Obtain quotes if upgrades are required• Produce back out plans• Develop quality plan• Plan acceptance of support groups
Designing, Building & Configuring
• Components assembled in controlled process
• All components of release should be under Configuration control
Testing & Release Acceptance
• Before going to live• Types of testing
– Functional testing– Operational testing– Performance testing– Integration testing– Testing & back out plans
• Final acceptance & sign off – part of Change
• Rejection treated as failed change
Rollout Planning
• Wholesale / “big bang”• Phased roll outs
– Geographical– Functional– Technological– Combination
Communication, Preparation & Training
• Support staff & customers• Training • Parallel working• Involvement in acceptance process• Rollout planning meetings
Distribution & Installation
• Distribution– Equipment reaches destination in time & in tact– Secure Storage Areas– Checked against relevant documentation– Final check before implementation
• Installation– Functional checks of equipment– Automate deployment– Installation routines– Include check of target– User checklists?
Software ordered
Software developedand supplied
Acceptance checks
OK?
RectificationAction
NoSoftware placed in DSL
Final approval
Package built intest environment
Operationalacceptance testing
OK?No Build in liveenvironment
Distribute to liveenvironment
Implemented onlive environment
CMDB
Normal Flow of software
Benefits, Costs & Problems
Business Benefits
• Minimum disruption• Better quality of service• Fewer & less frequent releases• Effective scheduling of users for testing• Overall reduction in business risk• Business knows what to expect & can plan
Financial Benefits
• Assets more controlled• Less time & resources spent on rework• More responsive to revenue producing
opportunities• Prevention of duplication of activities
IT Benefits
• Consistent quality of releases• Centralised control• Improved quality and control of changes• Effective planning of staff activities• Number of regressions are reduced• Easier detection of unauthorised and
incorrect versions• Less blame shifting
Costs
• Storage costs• Build , test and archive environments• Secure equipment stores• Software distribution tools• Network bandwidth• Telecommunications• Staff and training
Problems
• Circumvention of procedures• Emergency fixes• Distribution of builds directly from
development• Uncoordinated implementation of Software
and Hardware• Resources not available for testing• Test results are invalid• Process is seen to be unclear or bureaucratic
Relationships
• Configuration Management• Change Management• Problem Management• Service Desk• Project Management• Developers and suppliers
Service Desk
The Service Desk
Structure not a process
• Drive & improve service to the
business• Single point of contact
– Advice– Guidance– Rapid restoration to service
Role of the Service Desk
• Supports the incident & problem management function
• Provide a central point of contact– Preventing the same incident being reported to
different people over & over– Preventing the loss of incidents– Preventing technical people being disrupted– Preventing unnecessary work if already known
error
Objectives
• Single point of contact for reporting of incidents
• Accurately record information about incidents
• Co-ordinate activities to restore service to normal
• Support the incident & problem management
functions
• Provide management information
• Provide support & advice to business
Key Elements & Processes
Service Desk Functions
• Log Incident• Pre-scan phase
– Not Known Error– Proper procedure have been followed– Required supporting evidence is complete & present
• Incident Management• Service Desk remains responsible• Responsible for escalation• Regularly feeds back to user
Service Desk & Change Management
• Log Changes & cross reference to problems
• Issue change schedules
• Monitor & track changes & assist with
escalation
• Inform users of change once complete &
update change schedules
Common Features of Service Desks
• A single point of contact for all users
• A central log of all incidents
• Each incident uniquely numbered and date/time stamped
• Diagnostic scripts and other aids
• Configuration Management Support Tools
• Known Error Lists
• An impact coding system
Common Features of Service Desks
• Automatic escalation procedures based on impact, priority and elapsed time
• Telephone and electronic mail communication with all support staff
• Interface to Service Level Agreements
• Regular progress reporting
• Classification of incidents at call closure
• Regular management summaries of calls received and resolved
Service Desk Structures
Local Service Desk
• Local desk meeting local needs
• Support staff also local
• Becomes impractical with multiple locations
• Several local desks – operational standards
• Common processes across all locations
Local Service Desk
Local User
Local User Local
User
Third Party Support
Network & Operations
Support
Application Support
Desktop Support
Service Desk First line Support
Centralised Service Desk
Customer Site 1
Customer Site 2
Customer Site 3
Third Party Support
Network & Operations
Support
Application Support
Desktop Support
Service Desk
Second Line Support
Internet
WanModem
Virtual Service Desk
Paris Service Desk Sydney Service Desk
Modem
Third Party Supplier Service Desk
Cape Town Service Desk
Local Users
LAN
ServiceManagementDatabase(s)
London Service Desk
Toronto Service Desk
fax
LAN
Durban Service Desk
User Site ‘n’User Site ‘n’User Site IUser Site I
Telephone
Local Users Remote Users
Virtual Desks
• Physical location immaterial• Used for global organisation• Benefits include
– Reduced operational costs– Consolidated management overview– Improved usage of available skills– Knowledge sharing
• Onsite assistance still required
Outsourcing
• Have outsourcers use your Service Desk tool
• Keep ownership of management information
• Ensure suitably skilled staff
• Request details of staff
• Monitor value for money
• Check supplier dependencies
• Ensure deliverables are clearly understood
Service Desk
Skill Sets
Staff Profiles
• Understanding of business
• Understanding of IT Infrastructure
• Exceptional interpersonal skills
Technically Unskilled Staff
• Centralised Service Desks• Emphasis on interpersonal skills• Large call volumes, little support• Administrates and coordinates calls• Relies on diagnostic scripts and other tools• Technical staff are not distracted or demotivated• No in-depth support• Potential job satisfaction is high
Technically Skilled Staff
• Lower call volumes, greater support
• Longer call times
• May become to involved in technical aspects
• Job satisfaction issues
• Customer satisfaction issues
• Peak time staffing issues
• Familiarity breeds contempt
Expert Staff
• Resolve all calls
• Staff are more important than procedures
• Will play the role of technical departments
Incident Management
Definition of an Incident
Any event which is not part of the standard operation of a service and which causes, or may cause, an interruption to, or reduction in, the quality of that
service
Includes• New services• Automatically registered events
Mission
To minimise the impact of service disruptions to the business by restoring that service through
effective management of incidents
Scope
• Inputs– Incident details from service desk– Configuration details– Matched incidents, problems & known errors– Resolution details– RFC
• Outputs– RFC for resolution– Resolved & closed incidents– Communication to Customers– Management information
Objectives
• Restoration of service as quickly as possible• Ensure timely resolution of all incidents• Identify trends that may assist in incident
resolution• Assist problem management in identifying
trends
Key Concepts
Incident Handling
• Service Desk owns Incidents• Progress reporting• Incident Lifecycles
– New– Accepted / Assigned– Scheduled– WIP– On Hold / Waiting– Resolved– Closed
Levels of Support
• 1st line Support– Service Desk
• 2nd Line Support– Incident Management
• 3rd Line Support– Specialist Group
Key Concepts (cont.)
• Ownership & Communication– Monitor status against open Incidents– Incidents passed between support groups– Affected users are kept informed– Check for similar Incidents– Incidents that are likely to exceed SLA times
• Escalation– Functional Escalation– Hierarchical Escalation
Classification
• Category
Operating System
Application
Financials
Line
Network Connection
Modem
Monitor
Mouse
Hardware
Printer
Terminal
Connector
Software
Network
Impact Code
IMPACTCODE
DESCRIPTIONTARGET
RESOLUTION TIME
HighMajor service unavailableMany Users affected
1 Hour
MediumCustomer terminal or printer downCannot function
4 Hours
LowCustomer terminal or printer experiencing intermittent failure
8 Hours
Key Concepts (cont.)
• Incidents, Problems and Known Errors– Incidents are events or occurences that degrade
or disrupt a Service– A problem is the underlying cause of one or more
incidents that have not yet been diagnosed– Known Errors are
• Problems that have been diagnosed and have not yet been rectified
• Problems that have been diagnosed and for which a resolution or circumvention exists
Incident Management Activities
Reasons for classification• Identifying the service the Incident is
related to• Associate the Incident with the SLA • Selecting the most suitable support team• Indication of the impact and/or severity• Match Incidents to Known Errors• Determine a reporting structure
Incident detecting & recording
Initial classification & support
Service Request
Service Request Procedure
• Service Desk
• System Monitoring tools
• Capturing base & initial data• Diagnostic Scripts• Known Error Database• Skill Levels• Knowledge base and/or expert software
Incident detecting & recording
Initial classification & support
Service Request
Investigation & diagnosis
Resolution & recovery
Incident Closure
Service Request Procedure
Ow
ners
hip,
mon
itorin
g, tr
acki
ng a
nd
com
mun
icat
ion
• Support group accepts assignment
• Advise if work around can be provided
• Attempt resolution
• Record all details
• Monitor status against open Incidents
• Incidents passed between support
groups
• Affected users are kept informed
• Check for similar Incidents
• Incidents likely to exceed SLA times
• Escalation
Ow
ners
hip,
mon
itorin
g, tr
acki
ng a
nd
com
mun
icat
ion
Incident detecting & recording
Initial classification & support
Service Request
Investigation & diagnosis
Resolution & recovery
Service Request Procedure
Ow
ners
hip,
mon
itorin
g, tr
acki
ng a
nd
com
mun
icat
ion
Incident Closure
Incident
CMDB
Incident, Problem & KE databases
Diagnostic data system dumps and
journals
Support staff allocation
Basic fact gathering
Enquiries on historical data
Support staff allocation
Allocate further support
Incident Closure
Diagnosis/ Circumventions?
Escalation threshold exceeded
Liaise with Problem Management to create
Problem or Known Error record where appropriate
Who, When•
Results?•
Correlations•
Dumps, ID’s etc•
Diagnosis and resolution/
circumvention action
•What, why when?
•When
Incident progress summary
Free Format text record
Diagnostic data search
Y
Y
N
N
Problem Management
Mission
To minimise the disruption of IT services by organising IT resources to resolve problems,
preventing them from recurring and recording information that will improve the
way in which IT deals with problems, resulting in higher levels of availability and
productivity.
Scope
• Reactive – Solving problems in response to incidents
• Proactive– Solving problems before incidents occur
The main goal of Problem Management is the detection of the underlying causes of an incident and their subsequent
resolution and prevention
Objectives
• Identify, manage and resolve problems
• Prevents recurrence of problems
• Reduce the number and severity of problems
• Minimise impact to business
• Ensure right level of staff
• Record & manage information• Ensure vendor compliance when resolving problems
Key Concepts
Incident
Problem
Known Error
Change
IncidentControl
ProblemControl
ErrorControl
ChangeControl
Service Desk / Incident Management
ProblemManagement
ChangeManagement
Problem V Error Control
Problem Control
Transforms Problems into Known Errors
Error control
Resolving Known Errors via the Change
Management process
Problem Control
Problem Control
Problem Identification
Problem Classification
Problem Investigation and diagnose
Problem Identification
• Initial support could not match the Incident to a known problem
• Analysis of Incidents
• Analysis of IT infrastructure
• Significant or Major Incidents
Problem Classification
• Impact– Direct effect on the business
• Urgency– The measure of business criticality based on impact and
business need
• Priority– The order in which a series of items should be addressed– P=I x S x U
Defining Priority
Priority – sequence in which an Incident or Problemneeds to be resolved
Impact – measure of the business criticality ofan incident
Severity – what is the effect on the infrastructure / resources?
Urgency – extent to which the resolution of a Problemor error can bear delay
Priority = Impact Severity Urgencyx xP = I S Ux x
Investigation & Diagnosis
• Diagnosis of root cause
• Update of problem record
• May reclassify at closure
• Methods of problem Analysis
Error Control
Error Control Activities
Error identification & recording
Error assesment
Error resolution recording
Error closure
Error resolution monitoring
Error Identification
• Identification of root cause
• Identification of work around
• Two sources of known errors– Problem control– Development
Error Assessment
• Assessment of resolution– Priority, impact & urgency
• Logging of change
Error Resolution Recording
• Resolution recording– Known error database
• Data on all CIs are available for incident
matching
Error Closure
• Closed together with any Incidents
• Resolved or Closed Pending on review
Error Resolution monitoring
– Not responsible for RFCs– Monitor progress– Escalate through CAB if necessary
Proactive Problem Management
Proactive Problem Management
• Identifying and resolving problems before incidents occur
• Activities include:– Trend Analysis– Targeting support action– Providing information to business