using queuing theory to estimate the fte in production...
TRANSCRIPT
Cognizant Technology Solutions
-1-
Using Queuing Theory to Estimate the FTE in
Production Support Projects
Mahua Seth
Arunabha Sengupta Cognizant Technology Solutions, Kolkata
Abstract:
Mainstay of the software industry still happens to be Maintenance projects. Production
Support and Bug Fix form a major chunk of the work type in these Maintenance projects.
While there are production support projects of varied types and complexity, a large
number of them consist of addressing customer calls and applying immediate fixes to the
reported problems. In stable, steady projects, problems reported tend to be of a consistent
frequency and complexity over a large period of time. Yet, one faces problems in
balancing the conflicting needs of defined Service Level Agreements, resource
utilization, budget and ensuring proper system stability. This is getting more and more
relevant today as more projects move towards fixed- bid business model.
This paper demonstrates the use of Queuing theory in estimating the number of
resources necessary to form the support team so that the project is optimized in terms of
resource cost, service and waiting time for incoming problem reports, keeping the
unaddressed number of problem reports in check, and minimize the idle time of
resources.
Cognizant Technology Solutions
-2-
1.0 Introduction
Outsourcing business has been growing exponentially and has evolved to include more
diverse and complex projects these days. However the bulk of the mainstay revenue is
still dominated by maintenance projects where support is the primary service. These
projects traditionally used to be on a time and material pricing model. The shift in recent
times is towards total ownership outsourcing. More often than not, the Fixed Bid model is
the preferred pricing model.
Given this change in the industry scenario, it becomes difficult on part of the
companies to estimate the person power requirements especially in a fixed bid scenario
such that the necessary margins are maintained. This estimate today needs to be more
structured, accurate and scientific.
Problem Statement
A large number of maintenance projects in today’s software industry actually
comprises of routine production support jobs. It involves predictable fixes of bugs which
are common in nature and belong to a known range of problems. These bugs, over a
steady period, attain a more or less uniform and predictable rate of occurrence.
Cognizant Technology Solutions
-3-
A common problem faced by most of these types of projects is to optimally
estimate for the number of Full Time Employees (FTE) to be allocated. This problem is
due to the lack of availability of too many robust estimation methodologies for the
maintenance type of project. Function Points, Use Case Points and Feature Points are
techniques more suited to estimate size and effort for projects of development type
The constraints for this allocation can be manifold as per the project requirements.
The typical constraints are Service Level Management, Cost / Resource Utilization, and
Profitability Optimization.
From the service point of view, it can be transformed to the problem of server
estimation, keeping in mind the business needs of top-line and bottom-line for the
project. This paper looks at applying the principles of Queuing Theory for Optimal FTE
allocation.
2.0 Applying Queuing Theory
In the steady state phase of a maintenance project, bugs are raised by the customer
and reported to the project team at a given rate. There are a number of engineers
allocated to the project who respond to the reported bugs . On the arrival of a bug, if one
or more FTE is free, the bug is handled at once. If all the FTE’s are busy, the bug waits
its turn till one of the FTEs become free.
Cognizant Technology Solutions
-4-
During the steady state of the service, when the bugs are of a regular and
predictable nature, it can be assumed that over a long period the solution time for the
bugs become more or less constant.
The problem can be further elaborated. to estimate for the number of FTEs to be
assigned in order to optimize the project in terms of resource loading/ utilization,
resource cost, time to respond to the bugs, waiting time for a bug etc.
This can be formulated as a single queue multi server Queuing Theory problem
where the arriving bugs can be considered as customers in a queue and the engineers
allocated as FTEs can be considered to be the servers.
Queuing Theory Formulae
For a single queue multi server system we have the following:
If
λ = rate of arrival of customers,
µ = 1/(Time to service a customer) and
ρ = λ/µ
Solutions exist only for the cases where the number of servers n is such that
ρ/n < 1, otherwise number of customers in queue become out of bound.
For such cases, we have the following formulae.
Cognizant Technology Solutions
-5-
p0 = { 1 + ρ/ 1! + ρ2 /2! + …..+ ρn /n! + (ρn+1 / (n*n!) ) / (1- ρ/n) }-1
pk = (ρk /k! )* p0 ( 1 ≤ k ≤ n)
pn+r = (ρn+r / nr n! )* p0 ( 1 ≤ r)
where
p0 is the probability that all the servers are idle,
pk is the probability that k servers are busy,
pn+r is the probability that all servers are busy and r customers are in queue.
Given these, we have the following equations for the system characteristics:
Average nos. of customers in queue : _ r = (ρ/n) pn (1- ρ/n)-2
Average nos. of customers in system: _ _ z = r + ρ
Average waiting time in system : _ _ tw = z / λ
Average waiting time in queue _ _ tq = r / λ
Translating this series of equations in the production support scenario, we have:
λ = Arrival rate of bugs (per hour)
µ = 1/(time taken to solve a bug) = Bugs solved per hour by one FTE
p0 is the probability that all the FTEs are idle,
Cognizant Technology Solutions
-6-
pk is the probability that k FTEs are busy,
pn+r is the probability that all FTEs are busy and r bugs are in queue.
Given these, we have the following equations:
_ r = (ρ/n) pn (1- ρ/n)-2 = Average nos. of bugs in queue _ _ z = r + ρ = Average nos. of bugs in system (in queue and in the process of being solved) _ _ tw = z / λ = Average waiting time of bugs in system (in queue and in the process of being solved) _ _ tq = r / λ = Average waiting time of bugs in queue
3.0 Solving the FTE Allocation Problem
The Queuing Theory Principles can be used to optimally estimate the number of
FTEs required for a particular Production Support project based on the arrival rate of
bugs and solution time, depending upon the various typical project requirements:
a) Service Level Agreement (for response)
b) Service Level Agreement (for solution)
c) Permissible number of bugs in queue (in the process of solution and waiting for
response)
d) Optimizing for Project cost (FTE cost)
e) Minimizing idle time of FTEs
Cognizant Technology Solutions
-7-
3.1 Case Study 1
System Behavior
Bug Arrival Rate : 20 per day
Average Solution Rate : 3 hrs for each bug
Project Requirements
Service Level Agreement for the Time
to respond to bug
<=30 minutes
Service Level Agreement for Solution < = 4 hours 30 minutes
In this problem, the different system characteristics can be observed by
varying the number of FTEs to arrive at the optimal solution for the Service Level
Agreements.
Cognizant Technology Solutions
-8-
So, from this table, it is observed:
To ensure SLA for solution <=4.5 hours, that is average waiting time for a
bug in system to be less than 4.5 hours, the minimum number of FTE needed is 9.
To ensure SLA for response <=0.5 hour, that is average waiting time for a
bug in queue to be less than 0.5 hour, the minimum number of FTE needed is 10.
Hence the optimal number of FTE to meet the Service level agreements is 10.
Cognizant Technology Solutions
-9-
3.2 Case Study 2
System Behavior
Bug Arrival Rate : 8.75 per day
Average Solution Rate : 0.9 hr for each bug
Project Requirements
Service Level Agreement Time to
solution
<=1.2 hours
High resource utilization All resources cannot be idle for more
than 37% of the time.
In this problem, the different system characteristics can be observed by
varying the number of FTEs to arrive at the optimal solution for the Service Level
Agreement and Resource Utilization.
Cognizant Technology Solutions
-10-
So, from this table, it is observed:
To ensure SLA for solution ≤1.2 hours, that is average waiting time for a
bug in system to be less than 1.2 hours, the number of FTEs need to be greater
than 2.
However, to meet the utilization requirement, the number of FTEs should
be limited to 3.
Hence the solution of 3 FTE suits both constraints.
Cognizant Technology Solutions
-11-
3.3 Case Study 3
System Behavior
Bug Arrival Rate : 15 per day
Average Solution Rate : 1 hr for each bug
Project Requirements
Bugs should be solved ASAP Minimize waiting time for solution
Resource utilization needs to be high All FTEs cannot be idle for more than
15% of the time
In this problem, the different system characteristics and probability of all
resources being free, p0, can be observed by varying the number of FTEs to arrive
at the optimal solution for the waiting time and resource utilization
Cognizant Technology Solutions
-12-
So, from this table, it is observed:
The average waiting time in the system is 8.25 hours for a bug with 2
FTEs, 1.34 hours with 3 FTEs, 1.0681 hours with 4 FTEs and progressively lower
values till it converges to 1 hour (solution time) for number of FTEs > 8.
However, from 5 FTEs onwards, the percentage time that all the FTEs are
free becomes more than 15%.
So, in this case, the optimal solution is arrived at when number of FTEs is
equal to 4.
Cognizant Technology Solutions
-13-
3.4 Case Study 4
System Behavior
Bug Arrival Rate : 12 per day
Average Solution Rate : ½ hr for each bug
Project Requirements
Minimize Resource Cost Minimize nos. of FTEs
There should not be more than 2 bugs
unattended at any given point of time
Number of bugs in queue ≤3
In this problem, the different system characteristics and probability of
bugs in queue, pn+r , can be observed by varying the number of FTEs to arrive at
the optimal solution for the waiting time and resource utilization
Cognizant Technology Solutions
-14-
So, from this table, it is observed:
The probability of there being 3 bugs in the queue converges to 0 when the
number of FTEs is 4 or more.
Hence, in this case, the optimal solution is 4 FTEs.
Cognizant Technology Solutions
-15-
3.5 Case Study 5
System Behavior
Bug Arrival Rate : 15 per day
Average Solution Rate : 2 hr for each bug
Project Requirements
Client agrees to pay for 4 FTEs
The project team needs to agree to an
SLA for response and solution.
Desirable SLA for client < 4 hours
In this problem, the different system characteristics can be observed by
varying the number of FTEs to arrive at the optimal solution for the Service Level
Agreement and Resource Utilization.
Cognizant Technology Solutions
-16-
So, from this table, it is observed:
For 4 FTEs, the average waiting time for response to bugs is 6.9 hours and
the average time to solve a bug is 8.9 hours.
Hence, SLA can be ideally defined as 7 hours for response and 9 hours for
solution.
However, it can be suggested to the client that increasing the number of
FTEs to 5 will bring down the SLA for response to 1 hour and SLA for solution to
3 hours , which is desirable to the client.
Cognizant Technology Solutions
-17-
4.0 Conclusion
The suitability of applying Queuing theory to solve FTE allocation problems for various
project constraints is evident from the above case studies. However, the case studies
cover just a small number of scenarios and there can be numerous other decisions that
can be addressed by judicious use of the concepts. Hence, Queuing theory can be
appropriately used to solve optimization problems across all Production Support Projects.
References:
1. Gnedenko, Boris V., Theory of Probability, CRC Publishers, 1998
2. Ocharov, E and Wentel, L, Applied Problems in Probability Theory, Mir
Publishers, 1986