improving service level control with process mining

Improving Service level control with process mining

A research that shows how managers can control the service

levels of their product using the event log of the incident

management system

Subject: Research project

Student: Ing. R.H.J.C. van Wel

Date: 09 January 2013

Status: Complete

Improving Service level control with process mining| 09 January 2013

Summary

The objective of this research was to examine if the information, which is registered

in the event log of the incident management system, can add value in controlling

the service level of a product.

By using process mining techniques and tools, we were able to get insight in the

distribution- and handling activities of the incident management process. During our

process discovery phase we discovered that the service level of the product types

Desktop and Laptop, rapidly decreases when incidents are handled by two or more

assignment groups. In addition, we discovered that the incident management

system does not always register the correct timestamp of executed incident handler

activities. Also we saw that some incident handlers execute unusual process

activities and that the incident management system does not add extra service level

time when an incident is reopened after it was closed. Finally we discovered that the

company is able to extract data from the event log that can be used as a predictive

indicator for an increasing or decreasing workload.

The conclusion of this research is that the event log of the incident management

system contains enough information to visualize the distribution- and handling

activities of the incident management process. By using this information the

company is able to be more in control over the service levels of their products.


Pagina 3 van 27

Colophon

CAI Master of Science program

University Leiden

Course element Research project

Student Ing. R.H.J.C. van Wel

Email [email protected]

Version 1.4


Index

Summary 2

1 Introduction 5

1.1 Preface 5

1.2 Business case 5

1.3 Research relevance 5

1.4 Theoretical framework 6

1.5 Research question 7

1.6 Scope & delineation 7

2 Research methodology 8

3 Research results 9

3.1 Analyze event log 9

3.2 Process discovery 10

3.3 Process conformance 13

3.4 Workload prediction 16

4 Conclusion 21

4.1 Conclusion 21

4.2 Recommendations 21

4.3 Discussion 23

References 25

Appendix 26


Pagina 5 van 27

1 Introduction

This research has been conducted within a company whose name cannot be

mentioned for security reasons.

1.1 Preface

Many IT companies use incident management processes and incident management

systems to control their incident handling process. To manage this process, one can

use Key Performance Indicators (KPI).

The use of a KPI can, for example, be very helpful to see how good (or bad) the

service level of a certain product has performed or how well a related business unit

has performed in the handling of incidents.

1.2 Business case

Based on an interview, which was held with the Senior Process Manager (SPM) of

the company, the SPM states that the company is currently not able to respond

quickly enough to increasing workloads. One of the main reasons is that most

Business Unit Managers (BUM) focus on a monthly based KPI Incidents Resolved in

Time.

By using the KPI Incidents Resolved in Time, BUM’s can only act on a reactive way

because the distribution- and handling process has already occurred. This KPI also

does not show how incidents were distributed and handled by the business units.

Therefore it is difficult to find the reason why a service level of a product has

decreased.

1.3 Research relevance

The purpose of this research is to examine if the information, which is registered in

the event log of the incident management system, can add value in controlling the

service level of a product. Therefore, the first goal is to get insight in the

distribution- and handling activities of the incident management process.

The second goal is to examine if the information from this event log can be used to

predict increasing workloads and see what the effects are of these increasing

workloads.


Pagina 6 van 27

1.4 Theoretical framework

According to van der Aalst (2011, p.55) the performance of a process or

organization can be defined in different ways. Typically, three dimensions of

performance are identified: time, cost and quality. For each of these performance

dimensions, different Key Performance Indicators (KPIs) can be defined. When

looking at the time dimension, the following performance indicators can be

identified:

• The lead time (also referred to as flow time) is the total time from the

creation of the case to the completion of the case;

• The service time is the time actually worked on a case;

• The waiting time is the time a case is waiting for a resource to become

available;

• The synchronization time is the time an activity is not yet fully enabled and

waiting for an external trigger or another parallel branch.

Many systems have some kind of event log often referred to as ‘‘history’’, ‘‘audit

trail’’, ‘‘transaction log’’, etc. The event log typically contains information about

events referring to an activity and a case. The case (also named process instance) is

the ‘‘thing’’ which is being handled, e.g., a customer order, a job application, an

insurance claim, a building permit, etc. The activity (also named task, operation,

action, or work item) is some operation on the case. Typically, events have a

timestamp indicating the time of occurrence. Moreover, when people are involved,

event logs will characteristically contain information on the person executing or

initiating the event, i.e., the performer. (van der Aalst, van Hee, 2002)

The idea of process mining is to discover, monitor and improve real processes (i.e.

not assumed processes) by extracting knowledge from event logs readily available

in today’s systems. (van der Aalst, 2011).

According to van der Aalst (2011, p.9) event logs can be used to conduct three

types of process mining, namely:

1. Process discovery

The first type of process mining is discovery. A discovery technique takes an

event log and produces a model without using a-priori information. […] If

the event log contains information about resources, one can also discover

resource-related models, e.g., a social network showing how people work

together in an organization.

2. Process conformance

The second type of process mining is conformance. Here, an existing

process model is compared with an event log of the same process.

3. Process enhancement

The third type of process mining is enhancement. Here, the idea is to

extend or improve an existing process model using information about the

actual process recorded in some event log. Whereas conformance checking

measures the alignment between model and reality, this third type of

process mining aims at changing or extending the a-priori model.


Pagina 7 van 27

This research addresses the Process discovery phase and Process conformance

phase. The manner in which these phases have been executed, is described in

Section 2. The research conclusion and recommendations are defined in Section 4

and meant to be used for the Process enhancement phase in further research.

1.5 Research question

To be able to answer the main research question the following sub questions have

to be answered:

Sub questions:

Sub question 1: Which event log data must be used as information to visualize the

distribution- and handling activities of the incident management

process?

Sub question 2: Which information should business unit managers extract from the

event log to be able to predict a workload increase and see what

the effects are of these increasing workloads?

1.6 Scope & delineation

This research will only focus on incident management activities that were managed

by business units and in particular one business unit which we will call EUS (End

User Services). Therefore this research will not examine human resource activities.

The Process mining tools ProM and Disco will be used to execute the process

discovery phase. ProM will be used because it is an open-source tool which has

many plugins (e.g. Social networks and Petri nets) that can be used for process

analyses. However, the commercial process mining tool Disco is more easy to work

for the process conformance phase.

The results of the process discovery phase and process conformance phase will be

based on quantitative measurements. The quality of the incident management

process and related activities, can be discussed when the process enhancement

phase is executed.

The main research question will answer how managers can control the service level

of their products. In this research the meaning of “service level control” implies that

one is able to explain the cause and effects of a service level performance, based on

the information that is registered within the event log. If one is able to explain the

cause and effects of a service level performance, one also has the ability to share

this information and take action when this is necessary.

Which information should business unit managers extract from the incident

management event log to control the service level of their products?


Pagina 8 van 27

2 Research methodology

Analyze event log

First we need to extract the data, which is registered within the event log of the

incident management system. To analyze distribution- and handling activities we

need to have a substantial amount of historical data. Therefore we will extract an

event log, which contains information of all closed incidents between the period of

01-09-2010 and 30-09-2012.

After analyzing this event log, we will determine the research focus and define

which data (Case ID, Activity ID, Resource ID and Time dimension) can be used for

the process discovery step.

Process discovery

To determine which information is valuable for controlling the service level of a

product, we need to get insight in how the distribution- and handling activities were

executed. Therefore, we will visualize the incident management process by using

the process mining tool Disco. The results will show information about the

distribution activities of the incident management process. To see how the incidents

were handled by the business units, we will use a social network plugin from the

process mining tool ProM. These results will show how the business units interacted

with each other.

Process conformance

The results of the process discovery phase will be discussed with the SPM of the

incident management system and each unusual observation will be defined.

After executing the process discovery phase and the process conformance phase,

we are able to identify which information is needed to visualize the distribution- and

handling activities of the incident management process (Sub question 1).

Workload prediction

To predict an increasing or decreasing workload, we will build hypotheses and

examine them based on the information that is registered in the event log. In each

hypothesis we will explain our assumptions, explain what must be done to examine

these assumptions and analyze the results. This hypothesis cycle will be continued

until we are able to define which information business unit managers should extract

from the event log to be able to predict a workload increase and see what the

effects are of these increasing workloads (Sub question 2).

After answering sub question 1 and sub question 2, we are able to conclude which

information business unit managers should extract from the incident management

event log to control the service levels of their products (Main research question).

Process enhancement

The research conclusion and recommendations are defined in Section 4 and meant

to be used for the Process enhancement phase in further research.


Pagina 9 van 27

3 Research results

3.1 Analyze event log

The extracted event log consists of a lot of information. To define which information

we can extract from the event log, we have aggregated the data into one table1.

The event log encompasses the numbers, which the incidents are registered on.

These incident numbers can be used as Case ID’s. The executed activities

(Opened, Assignment, Resolved, Closed) can be used as Activity ID’s. Each

activity ID is linked to a Time Stamp and a Resource ID. This Recourse ID shows

the name of the business unit (named assignment group in the event log) that

executed the activity.

To distinguish the different types of assignment groups, we will rename these

assignment groups in the event log before we execute het process discovery phase.

The assignment group, which is linked to an open activity, will be called Control

group. According to the SPM, this type of resource is responsible for managing the

incident to the assignment group that is responsible for resolving the incident.

The assignment groups, which handle the activities after the control group, will be

called First Reassignment group, Second Reassignment group, Third

Reassignment group, Fourth Reassignment group, Fifth Reassignment

group and Nth Reassignment group2.

The assignment group, which is linked to a resolved activity, will be called a

Resolved group. The assignment group, which is linked to a closed activity, will be

called Closed group. This assignment group is always the same assignment group

as the Control group.

The event log also shows which incident has Breached the service level time and

which incidents were Resolved in time. Therefore we can use this type of

information to determine the service level performance of a product.

Each incident is registered on a Product type. Therefore we can use this

information to filter on specific product types that were managed by the business

unit EUS. We can use the Elapsed time3 data to see how much time it has taken to

resolve an incident. This time-type only measures the time, which is stipulated

according to the Service Level Agreement of a product.

Research focus

Our event log covers all incidents that were closed between the period of 01

September 2010 and 30 September 2012. In this period the company had resolved

162677 incidents. These incidents were registered on 1679 different product types.

52546 of the incidents were registered on the product types Desktop and Laptop

and managed by the Business unit EUS. Therefore we will continue this research by

focussing on all incidents that were registered on the product types Desktop and

Laptop.

1 See appendix Event log information 2 Nth reassignment groups means 6=< reassignment group 3 Elapsed time = service level time (measured time between opened time and resolved time)


Pagina 10 van 27

Trimmed mean

Looking at the spread of the elapsed times, we see that there are several outliers.

The maximum recorded elapsed time is 3403 hours and the minimum recorded

elapsed time is 0,0 hour. We will call the outliers with high elapsed time top-outliers

and outliers with low elapsed time bottom-outliers.

Moore and McCabe (2006) describe outliers as

individual values that fall outside the overall

pattern. The trimmed mean is a measure of

centre that is more resistant than the mean but

uses more of the available information than the

median. Trimming eliminates the effect of a

small number of outliers.

According to the SPM, these outliers should not be taken into account for this

research, because these outliers are unusual circumstances and will affect the

research results in a negative way. Therefore we will compute a 5% trimmed mean.

To execute this 5% trimmed mean, we discarded 5% of the top-outliers and 5% of

the bottom-outliers. After trimming the top-outliers and bottom-outliers, the event

log consists of 47290 incidents. The maximum elapsed time is 309 hours and the

minimum elapsed time of 1,3 hours. Table 3.1.1 shows the amount of incidents that

were controlled or resolved by the business unit EUS. Table 3.1.2 shows the amount

of incidents that were controlled and resolved by the business unit EUS. Table 3.1.3

show the amount of incidents that were controlled by the business unit EUS and

resolved by other business units.

Table 3.1.1 Incidents divided per control group and Resolved group

Control group Opened Resolved

Business unit EUS 47257 20600

Other Business units 33 26690

Total 47290 47290

Table 3.1.2 Controlled and Resolved incidents by Business unit EUS

Control group Resolved group EUS

Business unit EUS 20598

Table 3.1.3 Control group EUS / All resolved groups except EUS

Control group Resolved group ALL except EUS

Business unit EUS 26652

3.2 Process discovery

Figure 3.2.1 shows the process model that Disco has discovered based on the

47257 incidents that were managed by the business unit EUS. The process model

visualizes the flow of the incident distribution process. The arrows show how the

incidents were forwarded between the Control group (01 Opened), Assignment

groups (02 First Reassignment group, 03 Second Reassignment group, 04 Third

Reassignment group, 05 Fourth Reassignment group, 06 Fifth Reassignment group

and 07 Nth Reassignment group), Resolved group (06 Resolved) and Closed group

(09 closed).

The frequency of the activities are visualized per colour (low frequency = light blue

& high frequency = dark blue), by number and thickness of the arrows (low

frequency = small arrow & high frequency = thick arrow).

We will comment on the process model in section 3.3

Identifying outliers is a

matter for judgement. Look for

points that are clearly apart

from the body of the data, not

just the most extreme

observations in a distribution.

Moore & McCabe (2006)


Figure 3.2.1 Process model


Incident handling process

To visualize how the incidents were handled between the control groups and the

resolved groups, we used a social network plugin within the process mining tool

ProM. Hereby we divided the results by:

• Incidents that were managed (controlled) and resolved by the business unit

EUS (Figure 3.2.2);

• Incidents that were managed (controlled) by the business unit EUS and

resolved by other business units (Figure 3.2.3).

The size of the circles illustrate the number of incidents that each control group or

resolved group handled. The arrows show the relation between the control groups

and resolved groups. The colours are used to divide the control groups and resolved

group from each other.

Figure 3.2.2 Control groups EUS & resolved groups EUS

Figure 3.2.3 Control groups EUS & all resolved groups except EUS

We see that Control group 1 EUS managed most incidents within the business unit

EUS, but also the incidents that were resolved by other business units. By

generating these social networks (Figure 3.2.2 and Figure 3.2.3) we see how many

different types of resolved groups exists. BUM’s can distinguish the importance of a

relationship by creating these social networks and use this information to control the

service level of their product.


Pagina 13 van 27

3.3 Process conformance

The process model gives a good overview of the handling of the incident

management process. However, looking at the process model and data in the event

log we also observe some unusual process activities, namely:

Observation 1

We would not have expected that incidents need to be reassigned to an assignment

group after an incident is closed.

In each case this activity occurs, the time difference between the registered

activities Closed and Reassignment group is 1 second (e.g. Table 3.3.1). We assume

that both activities were executed simultaneously by one resource, however the

system registered those activities with a small time difference. This activity should

not occur because it illustrates a wrong perspective on the incident distribution- and

handling process. Therefore we recommend that the SPM should examine this

observation further.

Table 3.3.1 example reassignment activity after closed activity Activity Resource Date Time

Opened Control group 1 EUS 04.06.2011 23:43:56

Reassignment Group Assignment group 1 06.06.2011 7:34:20

Resolved Resolved group 1 06.06.2011 10:01:01

Closed Closed group 1 EUS 06.06.2011 10:01:29


Observation 2

We would not have expected that an incident needs to be reassigned to another

assignment group after an incident is resolved.

It seems that incident handlers execute additional tasks after the incident was

resolved (e.g. Table 3.3.2). These activities do not influence the service level

performance, because the incident is already resolved. However, this sort of activity

should not be executed according to the process model. Therefore we recommend

that the SPM should examine this observation further.

Table 3.3.2 example reassignment activity after resolved activity Activity Resource Date Time






Observation 3

When an incident is closed, it is possible to reopen the incident. For example, when

the end user is not satisfied with the resolved solution. However, when an incident

is reopened, the incident management system does not restart elapsed time. As a

result, the actual service level time is not registered correctly.


Pagina 14 van 27

In Table 3.3.3 we can see that the registered elapsed time of a case, is 6 hours and

28 minutes. This time is based on the opened activity (16.08.2012 / 8:17:11) and

first resolved activity (16.08.2012 / 14:45:45) . Because the incident was reopened,

the elapsed time should be measured up until the second resolved activity that was

executed on 30.11.2012 / 9:01:52.

Because the incident management system does not register the actual elapsed time,

it is likely that more incidents breached the service level time. Therefore we

recommend that the SPM should examine the cause of this type of occurrence and

show how this effects the service level performance.

Table 3.3.3 Actual elapsed time vs. registered elapsed time Activity Resource Date Time





Reopen Control group 1 EUS 17.08.2012 12:35:33







Observation 4

Only the time dimension elapsed time is usable without modifying the original event

log data. This is because the incident management system has already calculated

the actual service level time. Therefore we cannot measure, for example, the

waiting time between the activity opened and activity first reassignment. Also the

process mining tool Disco does not provide a filter method that exclusively

measures the service level time. To solve this problem we built a formula into the

event log, which measures only the service level window time.

Observation 5

By using the formula, as described in Observation 4, we can measure the lead time

and two types of waiting times, namely:

• Waiting time between opened activity and first reassignment activity;

• Waiting time between resolved activity and closed activity.

It is not possible to measure the time dimension service time and synchronization

time, because the event log does not provide data that is usable to measure these

types of time dimensions. As we cannot measure the service time and

synchronization time, it is also difficult to measure the amount of skills (human

resources) that are needed to cope with the current (or future) workload. Therefore

we recommend that the SPM should examine if the incident management system is

able to measure the time dimensions service time and synchronization time.


Pagina 15 van 27

Observation 6

As we have distinguished the assignment groups from each other in Section 3.1, we

also can examine the effect of the service level performance when incidents are

handled by one or more assignment groups within the business unit EUS. Table

3.3.4 shows the effect of the decreasing service level performance (% Resolved in

time Business unit EUS) when incidents are handled by one or more assignment

groups.

By comparing the % Resolved in time Business unit EUS with the % Average norm,

we observe that incidents, most likely, do not meet the service level norm when

they are not resolved after the first reassignment group. This effect is illustrated in

Figure 3.3.5. In addition, we observe that there are more incidents closed after they

were forwarded to three different assignment groups (Third reassignment group)

instead of two assignment groups (Second reassignment group).

Table 3.3.4 Service level performance # Resolved

in time # Breached # Total % Resolved

in time Business units *EUS*

% Average norm

No Reassignment group 108 4 112 96.4% 85%

First Reassignment group 13063 2406 15469 84.4% 85%

Second Reassignment group 911 308 1219 74.7% 85%

Third Reassignment group 984 458 1442 68.2% 85%

Fourth Reassignment group 176 86 262 67.2% 85%

Fifth Reassignment group 89 80 169 52.7% 85%

Nth Reassignment group 55 67 122 45.1% 85%

Total 15386 3409 18795 81.9% 85%

These results show the importance that incidents must be assigned to correct

assignment group in order to meet the service level norm. Therefore we

recommend that the SPM examines how incidents can be forwarded more efficient

in order to meet service level norm.

Figure 3.3.5 Service level performance


Pagina 16 van 27

Answering sub question 1

Which event log data must be used as information to visualize the distribution- and

handling activities of the incident management process?

To visualize the distribution- and handling activities the following data must be

used:

During the process of answering sub question 1 we also discovered that if we

rename the assignment groups, we were able to generate social networks to see

how the assignment groups interact with each other. By using the data Resolved in

time we were able to show what the effects are of the service level performance

when incidents are handled by one or more assignment groups. The given

recommendations are defined in Section 4.2 and discussed with the SPM in

Section 4.3.

3.4 Workload prediction

In this section we will use the answers of sub question 1 to find information that will

predict an increasing workload. Based on the information from the event log we will

build hypotheses and examine them. In each hypothesis we will explain our

assumptions, explain what must be done to examine these assumptions and analyze

the results. This hypothesis cycle will be continued until we are able to define which

information business unit managers should extract from the event log to be able to

predict a workload increase and see what the effects are of these increasing

workloads.

Hypothesis 1

We assume that the time dimension average waiting time first assignment and total

average elapsed time will be affected when an increasing workload occurs. We

assume that the number of incidents that breached the service level time will

increase when the workload increases (number of opened incidents).

Examine hypothesis 1

To examine our hypothesis, we need to count the number of opened and closed

incidents and compare these results with the number of incidents that were resolved

in time and/or the incidents that breached the service level time. In addition we will

add the time dimensions and analyze if the time dimension can be related with an

increasing workload.

Event log data Information

Incident number Case ID

Opened activity Activity ID

Assignment activity Activity ID

Resolved activity Activity ID

Closed activity Activity ID

Time stamps Waiting time & total time

Elapsed time Service level time

Control group Resource ID

Assignment group Resource ID

Resolved group Resource ID

Closed group Resource ID


Pagina 17 van 27

Observation hypothesis 1

Based on Figure 3.4.1 and Figure 3.4.2 we conclude that we cannot relate a time

dimension with an increasing workload. We see that numbers Resolved in time and

Breached are related with the numbers closed and resolved but none of these

numbers show predictive signals that are usable for the BUM to act upon.

Figure 3.4.1 Results hypothesis 1

Figure 3.4.2 Results hypotheses 1

To be able to act proactive on an increasing workload, we need to find information

within the event log that will predict this increasing workload. Based on the

information that is visualized in Figure 3.4.1 and Figure 3.4.2 we do not see any

warning signals that show that the service level of the product is increasing or

decreasing. When the amount of Opened incidents increases this effects the values,

Closed, Resolved in time and Breached. Because we want to extract information to

control the service level of their products, we need information that will tell us what

the effects are on the incidents that have breached the service level time or have

resolved in time. Therefore we created the second hypothesis.

Hypothesis 2

We assume that when a business unit is not able to handle an increasing workload,

the value Resolved in time will decrease and the value Breached will increase.

Therefore we think that the difference between the value Opened and Resolved in

time will correlate with the value breached.

Execute hypothesis 2

To examine our hypothesis we need to subtract the value Resolved in time from the

value Opened and analyze the relation between this value (Resolved in time –

Opened) with the value breached. In addition we will show how these results effect

the service level performance.


Figure 3.4.3 Results hypothesis 2 Relation (Open-resolved in time | Breached)

Figure 3.4.4 Results hypothesis 2 service level performance


In Figure 3.4.3 we can see that value Opened – Resolved in time relates with the value Breached. In addition we see that the value Opened –

Resolved in time has a predictive character when the workload rapidly increases or decreases. Also we see that when the value Opened –

Resolved in time increases or decreases, this effects the service level performance in a later time period. To examine how good the values

Opened – Resolved in time and Breached correlate4 with each other, we will measure the correlation coefficient of the two values.

Correlation Breached

Opened – resolved in time r 0,74

The correlation coefficient confirms that these two values have a relatively strong relationship.

4 The correlation measures the direction and strength of the linear relationship between two quantitative variables. The correlation (r) is always a number

between -1 and 1. Values of r close to -1 or 1 indicate a close linear relationship (Moore & McCabe, 2006).


Pagina 19 van 27

Now we will examine hypothesis 2 again to see if the value Opened – Resolved in time also has a predictive character, based on weekly results.

Figure 3.4.5 Weekly results hypothesis 2 (Open-resolved in time | Breached)

Figure 3.4.5 Weekly results hypothesis 2 service level performance


In Figure 3.4.5 we can see that value Opened – Resolved in time still shows a predictive character. However, it seems that the relation between

the two values is less accurate. When we examine the correlation coefficient based on these weekly results, we see that our assumption is

correct.

Correlation Breached

Opened – resolved in time r 0,59


Answering sub question 2

Which information should business unit managers extract from the event log to be

able to predict a workload increase and see what the effects are of these increasing

workloads?

Based on our examinations, we observed that if the number of incidents, that were

Resolved in time, are subtracted from the number of incidents that were opened,

that this value has a predictive character compared with the number of incidents that

breached the service level time. In addition we see that when the value Opened –

Resolved in time increases or decreases, this effects the service level performance in

a later time period. Our results also show that the value Opened – Resolved in time

indicates a higher predictive character based on monthly results, compare to weekly

results.

According to the SPM, most BUM’s focus on the KPI incidents resolved in time. Our

research results show that the value Opened – Resolved in time can be used to

predict the effect of the service level performance. By using this value, BUM’s can act

more proactive and therefore can be more in control of their service level.

Observation 7

Because the results on a monthly overview are more accurate than the results on a

weekly overview, we recommend the BUM’s to use the monthly overviews for long

term decision making and use the weekly overviews to see what the effects are when

short term decisions are made.


Pagina 21 van 27

4 Conclusion

4.1 Conclusion

Which information should business unit managers extract from the incident

management event log to control the service level of their products?

To visualize distribution- and handling activities, BUM should extract the following

information from the event log of the incident management system and import this

information into a process mining tool.

Event log data Information

Incident number Case ID

Opened activity Activity ID

Assignment activity Activity ID

Resolved activity Activity ID

Closed activity Activity ID

Control group Resource ID

Assignment group Resource ID

Resolved group Resource ID

Closed group Resource ID

By renaming the assignment groups to first reassignment group, second

reassignment group, etc., the BUM can create social networks and visualize how

business units interact with each other.

By comparing the number of incidents that were Resolved in time and the number

of incidents that were closed in the same time period, the BUM can calculate the

service level performance percentage (KPI incidents resolved in time). However, by

adding the variable assignment groups, the BUM is also able to see effect of the

service level performance when incidents are handled by one or more business

units.

The BUM can filter on the data Product type to focus on the products

that are related to his/her responsibility.

By subtracting the number of incidents that were resolved in time from the number

of incidents that were opened in that time period, the BUM can use this information

as a predictive indicator to see how the service level of an product will perform in the

future if no action is taken into account. Our research results show that this

predictive indicator is more accurate on a monthly based overview compare to a

weekly based overview.

4.2 Recommendations

Based on our observation the following recommendations are made;

Recommendation 1

During the process conformance phase, we observed that in 80 cases the incident

management system registers a wrong time stamp on the closed activity in the event

log. This type of occurrence illustrates a wrong impression of how incidents were


Pagina 22 van 27

handled. Therefore we recommend that the settings of the incident management

system should be changed, so that BUM’s have an accurate view of the incident

management distribution process.

Recommendation 2:

During the process conformance phase, we observed that, in 72 cases, an incident

handler executes unusual activities in the incident management system, between the

period an incident handler resolves an incident and the period an incident handler

closes an incident. According to the SPM, this type of activity should not be executed

according to the incident management process. Therefore, we recommend that the

SPM examines the cause of this type of activity. The solution can be found in two

types of changes:

1. The incident handler must execute this type of activity to be able to close

the incident. In this case the SPM needs to change the incident

management process model;

2. The incident handler executes an unnecessary activity. Therefore the

incident handler needs to be briefed how the incident should be handled

within the incident management system.

Recommendation 3:

The incident management system does not add the extra elapsed time when an

incident is reopened after it was closed. This means, that the chances are relatively

high that many incidents breached the service level time after they were reopened.

This affects automatically the service level performance of a product. Therefore we

recommend that the SPM examines the cause of this type of occurrence and show

how this affects the service level performance. If the effects on the service level

performance are relatively high, we recommend that the settings of the incident

management system should be changed, so that BUM’s will have more accurate

information to control the service level of his product.

Recommendation 4:

It is difficult to measure the amount of skills (human resources) that are needed to

stay in control with the service level control, because the event log does not provide

activity data that can be used to measure the time dimension Service time and

synchronization time. We recommend that the SPM examines the possibility to

measure the service times. If this is possible, the BUM can compare this information

with the number of skills (human resources) and the KPI incidents resolved in time

and calculate the amount of extra skills (human resources) that are needed when a

service level decreases.

Recommendation 5:

The performance of the service level rapidly decreases when incidents are handled by

two or more assignment groups. Therefore it is important that the quality of the

information, by which incidents are registered on, increases, so that the incident

coordinator knows which business unit must resolve the incident. If this quality can

be increased, this will automatically affect the performance of the service level in a

positive way. We recommend that the SPM examines how the quality of the

information can be increased so that incident handlers can act more efficiently and

more effectively.


Pagina 23 van 27

Recommendation 6:

By subtracting the number of incidents that were resolved in time from the number

of incidents that were opened in that time period, the BUM can use this information

as a predictive indicator to see how the service level of an product will perform in the

future if no action is taken into account. Because the results on a monthly overview

are more accurate than the results on a weekly overview, we recommend the BUM’s

to use the monthly overviews for long term decision making (approximately 4 weeks)

and use the weekly overviews to see what the effects are when short term decisions

(approximately 1 week) are made.

4.3 Discussion

Based on the conclusions and recommendations, the SPM stated the following:

The research results are very interesting, because now we know that we can use

valuable information from the event log of the incident management system to

control the performance of our service levels.

Recommendation 1:

Although these unusual time stamp registrations will not affect the performance of

the service level, it is interesting to see that process mining techniques can visualize

these kinds of problems. We always strive to improve our processes including the

systems that support the handling of these processes. Therefore I will ask an expert

to examine this problem and change the registration activities when this is possible.

Recommendation 2:

This unusual activity also does not affect the performance of the service level. Based

on your observation, I would like to know why this activity is executed. Therefore I

will ask a process manager to examine this type of activity and make changes when

this is needed.

Recommendation 3:

It is important that the incident management system registers the absolute elapsed

time. Therefore I will examine how much percentage of the incidents were reopened

after they were closed. If this percentage is significant, than we will look for

possibilities of how we can measure and register the absolute elapsed time within our

incident management system.

Recommendation 4:

At the moment it is not possible to measure the service time from the incident

management system. To measure the service times, we extract data from our

Enterprise Recourse Planning (ERP) system and compare this information with the

incidents that were resolved per employee. These results give us a good estimation

of how many extra skills (human resources) are needed. Therefore we do not need to

examine the possibilities to measure the service time from the incident management

system.

Recommendation 5:

By visualizing the effects on the performance of a service level when incidents are

handled by two or more assignment group, we see the quality of information, by

which incidents are registered on, must be improved. If we are able to improve the

quality of information, incidents will be resolved quicker. In addition, if the incidents


Pagina 24 van 27

coordinators are more capable to assign the incidents to the correct assignment

group, the workload of other assignment groups will decrease. These effects will

increase the performance of the service levels. Therefore I will examine how we can

improve the quality of information by which incidents are registered on.

Recommendation 6:

The research results show that we can use relatively simple data that can be used as

a predictive indicator to control our service levels in a proactive way. Unfortunately

we have to use the variables only based on judgment. I will investigate if we can use

these variables on different product types. If so, then I will inform the BUM to use

these variables and see what the effects are on our service levels.


Pagina 25 van 27

References

Literature:

• Jonker, J. & Pennink, B.J.W. (2004). De kern van methodologie. De kern van

organisatieonderzoek. 2e dr. Assen: Koninklijke Van Gorcum.

• Leeuw, A.C.J. de. (2005). Bedrijfskundige methodologie. Management van

onderzoek. 6e dr. Assen: Koninklijke Van Gorcum.

• Turban, E. & Sharda, R. & Delen, D. (2011). Decision Support and Business

Inteligence Systems. 9th edition New Jersey: Pearson Education, Inc.

• Aalst, W.M.P. van der (2011). Navigeren met process mining. Automatisering

Gids.

• Aalst, W.M.P. van der & Reijers, H.A. & Weijters, A.J.M.M. & Dongen, B.F.

van & Alves de Medeiros, A.K. & Song, M. & Verbeek, H.M.W. (2007).

Business process mining: An industrial application. Information systems

Volume 32, issue 5, pages 713-732. Amsterdam: Elsevier.

• Aalst, W.M.P. van der (2011). Process mining. Dordrecht: Springer.

• Aalst, W.M.P. van der & Hee, K.M. van (2002) Workflow Management:

Models, Methods, and Systems. Cambridge: MIT press.

• Moore, D.S. & McCabe, G.P. (2006) Introduction to the practice of statistics,

fifth edition. W.H. Freeman and Company.

Internet sources Process mining tooling:

• Process mining tool ProM

www.process mining.org

• Process mining tool Disco

http://www.fluxicon.com


Pagina 26 van 27

Appendix


Pagina 27 van 27

Event log information

Nr. Name Description

1 Agreement ID Service level contract number

2 Assignee Incident handler who executed the activity

3 Assignment group Business unit to which the incident is assigned when the activity

were handled

4 Breached Was the incident resolved in time (True, False)

5 Brief Description One liner incident problem

6 Calamity Did the incident lead to an calamity

9 Closed Group Business unit who has closed the incident

10 Closed By Incident handler who closed the incident

11 Closed on (date/time) Date and time when the incident is closed

15 Company Company who registered the incident

17 Control group Business unit who was responsible for managing the incident.

17 Impact Which impact is related to the incident (e.g. Users, Site, Enterprise)

18 Incident Registered incident number

19 Incident (type) - Incident: a (potential) disruption of an agreed service. - Pro-active incident: a (system) message, which is (still) no disruption of service provides. - Information request: a question about a service

- User support: a request to provide user support to a service.

20 Linked to problem If the incident is linked to a problem

21 Norm Service level resolve time (e.g.4 hours, 11 hours, 33 hours, 110

hours)

22 Opened by Incident handler who opened the incident

23 Opened on (date/time) Date and time when the incident is opened

27 Priority Which priority did the incident get based on the impact variable &

Urgency variable. (e.g. Low, Standard, High, Major, Critical)

28 Problem Type Product type name (e.g. Desktop, Laptop)

31 Resolved group Assignment group who resolved the incident

32 Resolved by Incident handler who resolved the incident

33 Resolved on (date/time) Date and time on which the incident is resolved

35 SLA title Name of the Service Level Agreement

36 SLO end date/time When was the incident closed (date en time)

38 SLO expiration

date/time

When must the incident be resolved (date time)

40 SLO name Service Level Object name

41 SLO start date Date and time when the incident is opened

44 Suspended Is the incident currently suspended? (True/False)

45 Ticket status Closed, Open

46 Urgency Which urgency variable did the incident get? (.g. Low, Normal,

Major)

47 Elapsed time Measured time between open activity and resolved activity

improving service level control with process mining

Documents