advanced information systems engineering · giulio petrucci . caise 2017 doctoral consortium...

17
Advanced Information Systems Engineering 29th International Conference CAiSE 2017 Essen, Germany, June 12-16, 2017 Proceedings of CAiSE Forum and Doctoral Consortium Papers Edited by Xavier Franch Universitat Politècnica de Catalunya, Spain Jolita Ralyté University of Geneva, Switzerland Raimundas Matulevičius University of Tartu, Estonia Camille Salinesi University Paris 1 Panthéon Sorbonne, France Roel Wieringa University of Twente, The Netherlands

Upload: others

Post on 25-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Advanced Information Systems Engineering 29th International Conference CAiSE 2017

    Essen, Germany, June 12-16, 2017

    Proceedings of CAiSE Forum and Doctoral Consortium Papers

    Edited by

    Xavier Franch

    Universitat Politècnica de Catalunya, Spain

    Jolita Ralyté University of Geneva, Switzerland

    Raimundas Matulevičius University of Tartu, Estonia

    Camille Salinesi University Paris 1 Panthéon Sorbonne,

    France

    Roel Wieringa University of Twente, The Netherlands

  • CAiSE 2017 Forum and Doctoral Consortium Papers Proceedings

    This volume of CEUR-WS Proceedings contains 20 Forum and 4 Doctoral Consortium papers presented at the 29th International Conference on Advanced Information Systems Engineering (CAiSE 2017). The conference was held in Essen, Germany, June 12-16, 2017. Copyright © 2017 for the individual papers by the papers’ authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. CEUR-WS.org, ISSN 1613-0073

  • CAiSE 2017 Forum Foreword

    The objective of the CAiSE conferences is to provide a forum for the exchange of experience, research results, ideas and prototypes between the research community and industry, in the field of information systems engineering. Along almost three decades, the conference has become the yearly worldwide meeting point for the information system engineering community. This year, the 29th edition of the CAiSE conference is held in Essen, Germany, from the 12th to the 6th of June 2017.

    One of the usual tracks in the CAiSE conference is the Forum, and this year is not an exception. The Forum sessions facilitate the interaction, discussion, and exchange of ideas among presenters and participants. Intended to serve as an interactive platform, the Forum aims at the presentation of emerging new topics and controversial positions, as well as demonstration of innovative systems, tools and applications. In accordance, two types of submissions have been called to the Forum:

    Visionary papers presenting innovative research projects, which are still at a relatively early stage and do not necessarily include a full-scale validation.

    Demo papers describe innovative tools and prototypes that implement the results of research efforts. The tools and prototypes will be presented as demos in the Forum.

    Each submission to the CAiSE’17 Forum was reviewed by three Program Committee members. Only those submissions for which there was an agreement on the relevance, novelty and rigor were accepted for presentation in the Forum. Additionally, some papers were invited to the Forum as a result of the evaluation process in the main conference. All in all, there was a total of 20 papers that were presented as part of the main conference program. The presenters gave a 3-minute elevator pitch and were available to discuss their work through a poster and/or system demonstration in a dedicated session. The 8-page papers describing the works are compiled in these proceedings.

    We would like to thank everyone who contributed to CAiSE’17 Forum. First, to our excellent Program Committee members who provided thorough evaluation of the papers and contributed to the promotion of the event. We thank all the authors who submitted and presented papers to the Forum for having shared their work with the community. Last, we would like to thank the CAiSE’17 Program Committee and General Chairs as well as the Local Organization Committee for their support. June 2017

    Xavier Franch Jolita Ralyté

    CAiSE Forum Co-Chairs

  • CAiSE’17 Forum Co-Chairs

    Xavier Franch Universitat Politècnica de Catalunya, Spain Jolita Ralyté University of Geneva, Switzerland

    CAiSE’17 Forum Committee

    Carina Alves Universidade Federal de Pernambuco, Brasil Said Assar Institut Mines-Telecom, France Juan Pablo Carvallo CEDIA; Universidad De Cuenca, Ecuador Dolors Costal Universitat Politècnica de Catalunya, Spain Rébecca Deneckère Université Paris 1 Panthéon Sorbonne, France Deepak Dhungana Siemens, Austria Christophe Feltus Luxembourg Institute of Science and Technology,

    Luxembourg Agnès Front University of Grenoble, France Smita Ghaisas Tata, India Chiara Ghidini Fondazione Bruno Kessler, Italy Irit Hadar University of Haifa, Israel Jennifer Horkoff University of Gothenburg, Sweden Marta Indulska University of Queensland, Australia Haruhiko Kaiya Kanagawa University, Japan Evangelia Kavakli University of the Aegean, Greece Christian Kop Alpen-Adria-Universitaet Klagenfurt, Austria Dejan Lavbič University of Ljubljana, Slovenia Lysanne Lessard University of Ottawa, Canada Emmanuel Letier University College London, UK Grace Lewis Software Engineering Institute, USA Gilles Perrouin University of Namur, Belgium Pilar Rodríguez University of Oulu, Finland Marcela Ruiz Utrecht University, Netherlands Arnon Sturm Ben-Gurion University of the Negev, Israel Gianluigi Viscusi EPFL, Switzerland Yong Xia IBM China, China

    Additional Reviewers

    Faeq Alrimawi Sorren Hanvey Giulio Petrucci

  • CAiSE 2017 Doctoral Consortium Foreword

    The CAiSE 2017 Doctoral Consortium (DC) was the 24th Doctoral Consortium of a series held in conjunction with the International CAiSE conference. It brought together PhD students working on foundations, techniques, tools and applications of Information Systems Engineering and provided them with an opportunity to present and discuss their research to an audience of peers and senior faculty in a supportive environment. The CAiSE 2017 DC was a unique opportunity to: − Get fruitful feedback and advice to the selected Doctoral students on their

    research project; − Meet experts from different backgrounds working on topics related to the

    Information Systems Engineering field; − Interact with other Doctoral students and stimulate an exchange of ideas and

    suggestions among participants; − Discuss concerns about research, supervision, the job market, and other career-

    related issues. The doctoral students involved were selected after a careful evaluation process of

    their papers by a couple of senior academics. Besides, the common quality, originality and thematic criteria, candidates had to have at least 6-12 months of work remaining before expected completion (and at least 12 months of work already performed), so as to fully benefit from the Doctoral Consortium. Based on the recommendations provided by the mentors, papers were revised before publication in the proceedings. A collection of 4 papers was selected then presented at the meeting:

    In Stage-based Business Process Mining Hoang Nguyen presented a set of techniques for process mining at the process stages. The major goals of the research were to extract business process stages from the event logs, to mine process logs and to perform predictive process monitoring at different process stages. The paper reports on the preliminary results for discovering process stages and for mining process logs based on the process stages.

    Christian Fleig presented the paper Towards the Design of a Process Mining-Enabled Decision Support System for Digital Business Process Transformation. The author proposed the process-mining enabled decision support systems to guide development and transformations of the business processes.

    Towards Operationalization of Business Models: Designing Service Compositions for Service-Dominant Business Models by Bambang Suratno considered transformations of the business process models to the service compositions. A successful composition requires understanding of the essential properties and guidance for the service composition and execution.

    Alex Mircoli in his presentation of Automatic Emotional Text Annotation Using Facial Expression Analysis discussed the approach to enrich textual information with the emotional aspects. The major challenge of this study is to explain how to capture such an information from the speech and video presentations.

    Last but not least the CAiSE DC featured a short tutorial on the research methods given by Prof. Roel Wieringa.

  • We would like to thank warmly the DC mentors for their dedication and advice to the doctoral students. We hope students could fully benefit from all advices that were provided about their papers and during the meeting, and we wish them a long and fruitful career in research and higher education.

    June 2017

    Raimundas Matulevičius Camille Salinesi

    Roel Wieringa CAiSE 2017 DC Co-Chairs

    Doctoral Consortium Co-Chairs

    Raimundas Matulevičius University of Tartu, Estonia Camille Salinesi Université Paris 1 Panthéon Sorbonne, France Roel Wieringa University of Twente, The Netherlands

    Doctoral Consorium Mentors

    Marite Kirikova Riga Technical University, Latvia Selmin Nurcan Université Paris 1 Panthéon Sorbonne, France Oscar Pastor Universitat Politècnica de Valencia, Spain Barbara Pernici Politecnico di Milano, Italy Hans Weigand Tilburg University, The Netherlands Jelena Zdravkovic Stockholm University, Sweden

  • Table of Contents

    CAiSE 2017 Forum Papers

    A Data-Driven Approach to Improve the Process of Data-Intensive API Creation and Evolution

    Alberto Abelló, Claudia Ayala, Carles Farré, Cristina Gómez, Marc Oriol, and Oscar Romero

    1

    Smart Logistics: An Enterprise Architecture Perspective Prince M. Singh, Marten van Sinderen, Roel Wieringa

    9

    Enriching Business Artifacts with Coordination Matteo Baldoni, Cristina Baroglio, Federico Capuzzimati, and Roberto Micalizio

    17

    EthDrive: A Peer-to-Peer Data Storage with Provenance Xiao Liang Yu, Xiwei Xu, and Bin Liu

    25

    Hybrid Remote Expert - an Emerging Pattern of Industrial Remote Support Ethan Hadar, Joseph Shtok, Benjamin Cohen, Yochay Tzur, and Leonid

    Karlinsky

    33

    XES Tensorflow – Process Prediction using the Tensorflow Deep-Learning Framework

    Joerg Evermann, Jana-Rebecca Rehse, and Peter Fettke

    41

    A Process Mining Based Model for Customer Journey Mapping Gaël Bernard and Periklis Andritsos

    49

    VarMeR – A Variability Mechanisms Recommender for Software Artifacts Iris Reinhartz-Berger and Anna Zamansky

    57

    Cloudy with a Chance of Usage? – Towards a Model of Cloud Computing Adoption in German SME

    Robert Deil and Philipp Brune

    65

    Model Fragment Reuse Driven by Requirements Raúl Lapeña, Jaime Font, Carlos Cetina, and Óscar Pastor

    73

    Regerator: a Registry Generator for Blockchain An Binh Tran, Xiwei Xu, Ingo Weber, Mark Staples, and Paul Rimba

    81

    Regression Testing for Visual Models Ralf Laue, Arian Storch, and Markus Schnädelbach

    89

    Privacy Level Agreements for Public Administration Information Systems Vasiliki Diamantopoulou, Michalis Pavlidis, and Haralambos Mouratidis

    97

  • Artifact-driven Process Monitoring: Dynamically Binding Real-world Objects to Running Processes

    Giovanni Meroni, Claudio Di Ciccio, and Jan Mendling

    105

    GH4RE: Repository Recommendation on GitHub for Requirements Elicitation Reuse

    Roxana Lisette Quintanilla Portugal, Marco Antonio Casanova, Tong Li, and Julio Cesar Sampaio do Prado Leite

    113

    Improving Problem Resolving on the Shop Floor by Context-Aware Decision Information Packages

    Eva Hoos, Pascal Hirmer, and Bernhard Mitschang

    121

    Information Logistics and Fog Computing: The DITAS* Approach Pierluigi Plebani, David Garcia-Perez, Maya Anderson, David Bermbach, Cinzia Cappiello, Ronen I. Kat, Frank Pallas, Barbara Pernici, Stefan Tai, and Monica Vitali

    129

    Towards Multi-decision-maker Requirements Prioritisation via Multi-Objective Optimisation

    Fitsum Meshesha Kifetew, Angelo Susi, Denisse Muñante, Anna Perini, Alberto Siena, and Paolo Busetta

    137

    An Empirical Evaluation to Identify Conflicts Among Quality Attributes in Web Services Monitoring

    Jael Zela Ruiz and Cecilia M. F. Rubira

    145

    Business Process Modelling for a Data Exchange Platform Christoph Quix, Arnab Chakrabarti, Sebastian Kleff, and Jaroslav Pullmann

    153

    CAiSE 2017 Doctoral Consortium Papers

    Stage-based Business Process Mining Hoang Nguyen

    161

    Towards the Design of a Process Mining-Enabled Decision Support System for Business Process Transformation

    Christian Fleig

    170

    Towards Operationalization of Business Models: Designing Service Compositions for Service-Dominant Business Models

    Bambang Suratno

    179

    Automatic Emotional Text Annotation Using Facial Expression Analysis Alex Mircoli

    188

  • Stage-based Business Process Mining

    Hoang Nguyen1

    Supervisors: Marcello La Rosa1, Marlon Dumas2 and Arthur H.M. terHofstede1

    1 Queensland University of Technology, [email protected], {a.terhofstede,m.larosa}@qut.edu.au

    2 University of Tartu, Estonia{marlon.dumas}@ut.ee

    Abstract. Evidence-based BPM has gained significant momentum inrecent years, thanks to the widespread adoption of enterprise systemsthat store detailed business process execution data in event logs. Tech-niques for analyzing business processes using event logs are termed “pro-cess mining” techniques. Their objective is to aid business analysts inimproving business processes by learning knowledge from massive data.To date, techniques for process mining abound. For example, one canmeasure processing time and waiting time, diagnose process delays andquality issues, and replay an entire event log over a process model dis-covered from the log itself. However, these techniques often suffer fromlimited applicability, particularly when used on top of unpredictable pro-cesses such as patient treatment processes in healthcare as opposed topredictable processes such as a car manufacturing process. They failedto extract a highly fit process model, awkward in measuring process per-formance, and inaccurate in predictive monitoring. In addition, they areconfused at how to divide the problem into sub-problems for better so-lutions. This research aims at designing a novel set of techniques basedon a notion of business process stages which can improve over existingprocess mining techniques.

    Keywords: Business process management, process mining, multistage,stage-based, decomposition

    1 Research Motivation

    Process Mining [1] was initiated from the field of Business Process Managementthat oversees and improves human work in organizations [2]. Therefore, ProcessMining also concerns with common tasks in BPM such as process performanceanalysis, conformance checking and root cause analysis. However, differing fromthe social science branch of BPM concerning interviews, workshops and surveysfor data collection, Process Mining focuses on analysing large and rich businessprocess data (called event logs) available in enterprise IT systems in order toextract useful knowledge [3]. Process Mining thus is a bridge between BPM anddata mining.

    Like data mining, process mining techniques exploit data features (or vari-ables) in event logs to learn useful knowledge for process improvement. Thesetechniques fall into a number of categories. Process discovery [4] is to deriveprocess models from event logs. Conformance checking [5] is to align an eventlog with a process model to verify whether the process execution complies withthe process design. Performance analysis [6] is to measure process performance

    JolitaTypewritten TextX. Franch, J. Ralyté, R. Matulevičius, C. Salinesi, and R. Wieringa (Eds.):CAiSE 2017 Forum and Doctoral Consortium Papers, pp. 161-169, 2017.Copyright 2017 for this paper by its authors. Copying permitted for private and academic purposes.

  • 2 H. Nguyen

    metrics to identify bottlenecks. Deviance mining [7] is to derive business rulesfrom event logs that can explain the root cause of positive or negative deviants.Predictive monitoring [8] aims at building predictive models that allow one tomake forecasts of process performance. Finally, comparative analysis [9] is tocontrast process variants and extract distinguishing behaviors. From anotherperspective, these techniques fall into two themes: structural analysis and be-havioral analysis. In structural analysis, the purpose is to search for a structurefrom event logs that highly represents the process, e.g. process models, whichcan help to do performance analysis, conformance checking and serve as a basisfor process re-engineering. In behavioral analysis, the purpose is to search for aset of behaviors (e.g. activity patterns) that are strongly correlated with a tar-get variable, e.g. long case duration. Behavioral analysis is common in deviancemining and predictive monitoring based on trained classifiers such as decisiontrees [8], random forests [10], and neural networks [11]. In addition, some worksalso regard process models as a source of generalized behaviors for descriptiveanalysis [9].

    Thus far, the main challenge to process mining is that many event logs exhibita highly complex feature space. For example, real-life event logs can be foundon the Business Process Intelligence web site from 2011 to 20171. Notably, theyare often knowledge-intensive processes [12] such as patient treatment, insuranceclaim handling, IT incident handling, and loan application assessment. Their fea-ture space often includes, but not limited to, activities, humans, data payload,process context [13] and timestamps. Three main challenges of this feature spaceare the heterogeneity of case context, the decomposition into sub-processes, andthe variability of data features. Different case contexts exist because processcases, e.g. customer orders or patients, are often prioritized based on differenttypes, e.g. low-value and high-value cases, and processed differently. Mixing casecontexts therefore can create greater variation in data features, thus makes itmore difficult for process structural analysis. Sub-processes often exist and couldbe in sequence, in parallel or overlap. They are interrelated but fairly indepen-dent. Ignoring these sub-processes in one analysis might be the cause of inac-curate models. Moreover, the inherent variability of data features in businessenvironment is a challenge to frequent feature mining for business processes.In many cases, it is the combination of these three challenges that creates avery heterogeneous feature space. Consequently, the current problems faced byprocess mining are scalability and accuracy. For example, process discovery tech-niques struggle with ill-structured processes [14]. Mining human-readable rulesfrom event logs remains an issue [7]. The error rate of predictive monitoringremains remarkably high [15, 11].

    Various process mining techniques have been proposed to deal with the abovecomplex feature space. A common approach is based on decomposition of eventlogs into clusters, thus able to work with clusters (i.e. a higher abstraction level)instead of individual events. It is also known as divide and conquer approachwhich has been implemented for process discovery [16–20], conformance check-ing [19], performance analysis [21], deviance mining [22], and predictive moni-toring [23]. Decomposition can be horizontal (i.e. by cases) or vertical (i.e. byactivities). However, although scalability has been improved, the accuracy issueremains [18, 19, 23, 11]. Proposed techniques seem to be ad hoc while they onlywork with some specific datasets and struggle with others. There are severalreasons learned from empirical results. For structural analysis, the proposed de-compositions may underrepresent the real process structure [24]. Thus, when the

    1 www.win.tue.nl/bpi/doku.php?id=2017:challenge

    162

  • Stage-based Business Process Mining 3

    models are tested against the logs, the result has low fitness and precision [6].For behavioral analysis, despite the use of decomposition and strong classifiers,the accuracy could be affected due to the limited coverage of the process featurespace, e.g. when a classification model contains only control-flow features butmany process cases are driven by resources and context [11].

    From the above background, this research proposes a novel process miningapproach based on a notion of business process stages. Semantically, stages area common way that humans use to divide their work into manageable parts. Astage thus is also a sub-process. For example, an outpatient treatment processinvolves stages such as reception, diagnosis, medication, and consultation. Stageshave also been observed in BPM research and real datasets, including patienttreatment[9], IT service delivery[25], government agency processes [26], bankloan application [27], and product development [28]. Traditionally, process stageshave been studied in different disciplines. For example, in manufacturing it isknown through the state space model for fault diagnosis [29, 30]. In patient flowresearch, it is called compartment model [31–33]. In product development, itis known as the stage-gate model [34]. Recently, stage-based analysis has beenstudied in process mining for inter-organizational comparative analysis but onlyon a manual basis [9, 35]. Continuing this stream, this research aims to developstage-based techniques for knowledge-intensive processes taking advantage ofevent logs and foundational techniques of process mining.

    The intuition here is that stages can help to improve process mining tech-niques. Intuitively, data features within the same stages tend to exhibit strongerrelationship than those from different stages; thus, stage-based techniques couldproduce better result than those applied to the whole process. For example,stages could provide a vertical decomposition of event logs (i.e. by stages) in or-der to improve the quality of process models. The first question is how to discoverstages from event logs that mimic the actual stage decomposition. Once stageshave been correctly discovered, they can be used to discover process models bystages instead of one flat model for the whole log. Another application of stagesis to measure flow performance [36]. This kind of performance is of particularinterest in service organizations such as hospitals, product development and ITservices because they are concerned with how smooth cases are pulled throughthe organizations. Since a stage decomposition consists of adjacent stages, eachis a fairly independent queueing system, it is thus allowed to measure flow ofcases (i.e. queuing items) based on queuing measures computed from event logs,e.g. arrival rate, departure rate, and length of queue. In addition, in predictivemonitoring, it could be more accurate to build classifiers within a stage to pro-vide prediction within that stage only, combined with inter-stage classifiers toprovide a final prediction.

    2 Research Problems & Research Questions

    The previous section has discussed current research problems in detail. They aresummarized as follows.

    1. Current process discovery techniques suffer from low accuracy for ill-structured processes

    2. Current process performance analysis techniques are limited in measuringthe flow performance of business processes

    3. Current predictive process monitoring techniques suffer from high error ratefor ill-structured processes

    163

  • 4 H. Nguyen

    Our research will be structured to address the following research questions:

    1. How to discover business process stages from event logs?2. How to mine business process performance from event logs based on stages3. How to discover process models from event logs based on stages?4. How to perform predictive process monitoring in stages?

    3 Research Approach

    This research project aims at developing stage-based techniques that can producebetter result than existing techniques. We consider Design Science (DS) [37]as a relevant research method as its nature is to produce knowledge based onthe development of artifacts (e.g. models, frameworks, and methods) to solvea problem [37]). In our research, the problems would be the research problemsand the artifacts would be computer software that implements our proposedtechniques.

    Following the Design Science method, this project will primarily undergo fivemain steps to develop a technique [37]: (i) Define the problem; (ii) Suggest a so-lution; (iii) Develop artifacts; (iv) Evaluate the artifacts; (v) Conclude. Amongthese, the validity of DS-based research is mainly determined by the evaluationof the artifacts [38]. There are different validation approaches including obser-vational, analytical, experimental, testing and descriptive [39]. This project willmainly take the experimental approach given the data-driven nature of the re-search.

    A rigorous approach to experimental evaluation thus is vital to this project.The evaluations will generally consist of two parts: data-based and user-based.The former makes use of objective and quantitative measures while the latterinvolves humans, where needed, in qualitative assessment. Outline of researchexperiments are given below.

    – Experiments will be carried out on event logs of varied characteristics– Evaluation will be performed based on well-established criteria in Data Min-

    ing and Process Mining– Controlled experimentation [40] will be conducted with stakeholders, where

    needed, to evaluate the subjective aspect of the research criteria– The proposed technique will be benchmarked against baselines available in

    the literature

    In regards to data collection and analysis, event logs are the main datasetsused for experiments in this research. Access to data in different ways is plannedas follows:

    – Synthetic datasets will be created for the first validation using business pro-cess simulation software, e.g. BIMP2 and CPN-Tools3.

    – Real-life datasets will be sourced from repositories of publicly available logsand industrial as well as academic partners. The publicly available logs areprovided on academic public data repositories such as 3TU.Datacentrum4 ofEindhoven University of Technology which have been used as benchmarkingdata for experiments in previous research in Process Mining.

    2 bimp.cs.ut.ee3 www.cpntools.org4 data.3tu.nl/repository/collection:event logs

    164

  • Stage-based Business Process Mining 5

    – The student may request for access to datasets of other research projectswithin the BPM Discipline. The request will be in compliance with the EthicsClearance of the projects.

    – The student will contact the pool of industrial partners of the BPM Disci-pline such as Commonwealth Bank of Australia, Suncorp and St AndrewsWar Memorial Hospital, to access further real-life logs, should this be needed.

    4 Preliminary Results

    The research thus far has carried out towards addressing the first two researchquestions: mining process stages from event logs and mining process performancebased on stages. The result is reported in the following sections.

    4.1 Mining Business Process Stages from Event Logs

    Process mining techniques suffer from scalability issues when applied to largeevent logs, both in terms of computational requirements and in terms of in-terpretability of the produced outputs. For example, process models discoveredfrom large event logs are often spaghetti-like and provide limited insights [14].

    A common approach to tackle this limitation is to decompose the process intostages, such that each stage can be mined separately. This idea has been success-fully applied in the context of automated process discovery [24] and performancemining [41]. The question is then how to identify a suitable set of stages andhow to map the events in the log into stages. For simpler processes, the stagedecomposition can be manually identified, but for complex processes, automatedsupport for stage identification is required. Accordingly, several automated ap-proaches to stage decomposition have been proposed [18, 19, 42]. However, theseapproaches have not been designed with the goal of approximating manual de-compositions, and as we show in this work, the decompositions they produceturn out to be far apart from the corresponding manual decompositions.

    This paper puts forward an automated technique to split an event log intostages, in a way that mimics manual stage decompositions. The proposed tech-nique is designed based on two key observations: (i) that stages are intuitivelyfragments of the process in-between two milestone events; and (ii) that the stagedecomposition is modular, meaning that there is a high number of direct de-pendencies inside each stage (high cohesion), and a low number of dependen-cies across stages (low coupling) – an observation that has also been applied inthe context of process model decomposition [43] and more broadly in the fieldsof systems design and programming in general. For example, a loan origina-tion process at a bank has multiple stages such as the application is assessed(accepted/rejected milestone), offered (offer letter sent milestone), negotiated(agreement signed milestone), and settled (agreement executed milestone). Theremay be many back-and-forth or jumps inside a stage, but relatively little acrossthese stages.

    The proposed technique starts by constructing a graph of direct control-flowdependencies from the event log. Candidate milestones are then identifiedby using techniques for computing graph cuts. A subset of these potentialcut points is finally selected in a way that maximizes the modularity ofthe resulting stage decomposition according to a modularity measure bor-rowed from the field of social network analysis. The technique has beenevaluated using real-life logs in terms of its ability to approximate manual de-compositions using a well-accepted measure for the assessment of cluster quality.

    165

  • 6 H. Nguyen

    4.2 Mining Process Performance Based on Staged Process Flows

    Process Performance Mining (PPM) is a subset of process mining techniquesconcerned with the analysis of processes with respect to performance dimen-sions, chiefly time (how fast a process is executed); cost (how much a processexecution costs); quality (how well the process meets customer requirements andexpectations); and flexibility (how rapidly can a process adjust to changes in theenvironment) [2].

    Along the time and flexibility dimensions, one recurrent analysis task is tounderstand how the temporal performance of a process evolves over a given pe-riod of time – also known as flow performance analysis in lean management [44].For example, a bank manager may wish to know how the waiting times in aloan application process have evolved over the past month in order to adjust theresource allocation policies so as to minimize the effects of bottlenecks.

    Existing PPM techniques are not designed to address such flow performancequestions. Instead, these techniques focus on analyzing process performance ina “snapshot” manner, by taking as input an event log recorded during a periodof time and extracting aggregate measures such as mean waiting time, process-ing time or cycle time of the process and its activities. For example, both thePerformance Analysis plugins of ProM [45] and Disco [46] calculate aggregateperformance measures (e.g. mean waiting time) over the entire period coveredby an event log and display these measures by color-coding the elements of aprocess model. These tools can also produce animations of the flow of casesalong a process model over time. However, extracting flow performance insightsfrom these animations requires close and continuous attention from the analystin order to detect visual cues of performance trends, bottleneck formation anddissolution, and phase transitions in the process performance. In other words,animation techniques allow analysts to get a broad picture of performance issues,but not to precisely quantify the evolution of process performance over time.

    In this setting, this paper presents a PPM approach designed to provide aprecise and quantifiable picture of flow performance. The approach relies on anabstraction of business processes called Staged Process Flow (SPF). An SPFbreaks down a process into a series of queues corresponding to user-definedstages. Each stage is associated with a number of performance characteristicsthat are computed at each time point in an observation window. The evolutionof these characteristics is then plotted via several visualization techniques thatcollectively allow flow performance to be analyzed from multiple perspectives inorder to address the following questions:

    Q1. How does the overall process performance evolve over time?Q2. How does the formation and dissolution of bottlenecks affect the overall

    process performance?Q3. How do changes in demand and capacity affect the overall process perfor-

    mance?

    The paper demonstrates the advantages of the SPF approach over state-of-the-art process performance mining tools using real-life event logs of a Dutchbank and IT department of Volvo Belgium.

    5 Conclusion and Future Work

    This paper describes an overall approach of stage-based process mining basedon observed gaps in current process mining techniques. So far, we have proposed

    166

  • Stage-based Business Process Mining 7

    two stage-based techniques: one for discovering business process stages fromevent logs and one for mining process flow performance from event logs basedon stages. The former work shows that our stage decomposition technique canprovide results that are measurably much closer to the ground truth than thebaselines. The latter work shows that it provides insights and addresses questionsthat cannot be answered by existing performance mining techniques. In thefuture, we will continue developing stage-based techniques for process discoveryand predictive process monitoring.

    References

    1. van der Aalst, W.: Process Mining: Discovery, Conformance and Enhancement ofBusiness Processes. Springer (2011)

    2. Dumas, M., La Rosa, M., Mendling, J., Reijers, H.: Fundamentals of BusinessProcess Management. Springer (2013)

    3. Van Der Aalst, W., Adriansyah, A., de Medeiros, A.K.A., Arcieri, F., Baier, T.,Blickle, T., Bose, J.C., van den Brand, P., Brandtjen, R., Buijs, J.: Process miningmanifesto. In: Business Process Management, Springer 169–194

    4. Tiwari, A., Turner, C.J., Majeed, B.: A review of business process mining: state-oftheart and future trends. Business Process Management Journal 14(1) (2008)5–22

    5. Rozinat, A., van der Aalst, W.M.: Conformance checking of processes based onmonitoring real behavior. Information Systems 33(1) (2008) 64–95

    6. van der Aalst, W., Adriansyah, A., van Dongen, B.: Replaying history on processmodels for conformance checking and performance analysis. Wiley InterdisciplinaryReviews: Data Mining and Knowledge Discovery 2(2) (2012) 182–192

    7. Nguyen, H., Dumas, M., La Rosa, M., Maggi, F.M., Suriadi, S.: Mining businessprocess deviance: a quest for accuracy. In: On the Move to Meaningful InternetSystems, Springer (2014) 436–445

    8. Maggi, F.M., Di Francescomarino, C., Dumas, M., Ghidini, C.: Predictive monitor-ing of business processes. In: International Conference on Advanced InformationSystems Engineering, Springer (2014) 457–472

    9. Suriadi, S., Mans, R.S., Wynn, M.T., Partington, A., Karnon, J.: Measuring pa-tient flow variations: A cross-organisational process mining approach. In: Proc. ofAP-BPM, Springer (2014) 43–58

    10. Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.:Complex symbolic sequence encodings for predictive monitoring of business pro-cesses. In: International Conference on Business Process Management, Springer(2015) 297–313

    11. Tax, N., Verenich, I., La Rosa, M., Dumas, M.: Predictive business process moni-toring with LSTM neural networks. arXiv preprint arXiv:1612.02130 (2016)

    12. Di Ciccio, C., Marrella, A., Russo, A.: Knowledge-intensive processes: Character-istics, requirements and analysis of contemporary approaches. Journal on DataSemantics 4(1) (2014) 29–57

    13. Van der Aalst, W.M., Dustdar, S.: Process mining put into context. IEEE InternetComputing 16(1) (2012) 82–86

    14. van der Aalst, W.M.: Process mining: Discovering and improving spaghetti andlasagna processes. In: Proc. of CIDM, IEEE (2011)

    15. van Dongen, B.F., Crooy, R.A., van der Aalst, W.M.: Cycle time prediction: Whenwill this case finally be finished? In: OTM Confederated International Conferences,Springer (2008) 319–336

    16. Weerdt, D.: Leveraging process discovery with trace clustering and text miningfor intelligent analysis of incident management processes. In: IEEE Congress onEvolutionary Computation. (2012) 1–8

    17. Rebuge, A., Ferreira, D.R.: Business process analysis in healthcare environments:A methodology based on process mining. Information Systems 37(2) (2012) 99–116

    167

  • 8 H. Nguyen

    18. Carmona, J., Cortadella, J., Kishinevsky, M.: Divide-and-conquer strategies forprocess mining. In: Proc. of BPM, Springer (2009) 327–343

    19. Van Der Aalst, W.M.: A general divide and conquer approach for process mining.In: Proc. of FedCSIS, IEEE (2013) 1–10

    20. Conforti, R., Dumas, M., Garćıa-Bañuelos, L., La Rosa, M.: BPMN miner: Auto-mated discovery of BPMN process models with hierarchical structure. InformationSystems 56 (2016) 284–303

    21. van Dongen, B.F., Adriansyah, A.: Process mining: fuzzy clustering and perfor-mance visualization. In: Proc. of BPM Workshops, Springer (2010) 158–169

    22. Ghattas, J., Peleg, M., Soffer, P., Denekamp, Y.: Learning the context of a clinicalprocess. In: Business process management workshops, Springer (2010) 545–556

    23. Di Francescomarino, C., Dumas, M., Maggi, F.M., Teinemaa, I.: Clustering-basedpredictive process monitoring. IEEE Transactions on Services Computing (2016)

    24. Hompes, B., Verbeek, H., van der Aalst, W.M.: Finding suitable activity clustersfor decomposed process discovery. In: Proc. of SIMPDA, Springer (2014) 32–57

    25. Naldi, M., La Pinta, F., Lombardozzi, M., Picciano, M.: A phase model of theservice delivery process for bundle services. In: Computer Modeling and Simulation(EMS), 2012 Sixth UKSim/AMSS European Symposium on, IEEE (2012) 263–268

    26. van Dongen, B.: BPI Challenge 2015 Municipality 1 (2015)27. van Dongen, B.F.: BPI Challenge 2012. Eindhoven University of Tech-

    nology. Dataset (2012) http://dx.doi.org/10.4121/uuid:3926db30-f712-4394-aebc-75976070e91f.

    28. Pietzsch, J.B., Shluzas, L.A., Pat-Cornell, M.E., Yock, P.G., Linehan, J.H.: Stage-gate process for the development of medical devices. Journal of Medical Devices3(2) (2009) 021004

    29. Sulek, J.M., Marucheck, A., Lind, M.R.: Measuring performance in multi-stageservice operations: An application of cause selecting control charts. Journal ofOperations Management 24(5) (2006) 711–727

    30. Shi, J., Zhou, S.: Quality control and improvement for multistage systems: Asurvey. IIE Transactions 41(9) (2009) 744–753

    31. McClean, S.I., Millard, P.H.: A three compartment model of the patient flowsin a geriatric department: a decision support approach. Health care managementscience 1(2) (1998) 159–163

    32. Mackay, M., Lee, M.: Choice of models for the analysis and forecasting of hospitalbeds. Health Care Management Science 8(3) (2005) 221–230

    33. Harrison, G.W., Escobar, G.J.: Length of stay and imminent discharge probabilitydistributions from multistage models: variation by diagnosis, severity of illness, andhospital. Health care management science 13(3) (2010) 268–279

    34. Cooper, R.G.: Stage-gate systems: a new tool for managing new products. Businesshorizons 33(3) (1990) 44–54

    35. Partington, A., Wynn, M., Suriadi, S., Ouyang, C., Karnon, J.: Process miningfor clinical processes: A comparative analysis of four australian hospitals. ACMTransactions on Management Information Systems 5(4) (2015) 19

    36. Anupindi, R., Chopra, S., Deshmukh, S.D., Mieghem, J.A.V., Zemel, E.: ManagingBusiness Process Flows (3rd ed.). Prentice Hall (2012)

    37. Dresch, A., Lacerda, D.P., Antunes Jr, J.A.V.: Design Science Research: A Methodfor Science and Technology Advancement. Springer (2014)

    38. Pries-Heje, J., Baskerville, R.: The design theory nexus. MIS Quarterly (2008)731–755

    39. Von Alan, R.H., March, S.T., Park, J., Ram, S.: Design science in informationsystems research. MIS Quarterly 28(1) (2004) 75–105

    40. Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and quasi-experimentaldesigns for generalized causal inference. Wadsworth Cengage Learning (2002)

    41. Nguyen, H., Dumas, M., ter Hofstede, A.H., La Rosa, M., Maggi, F.M.: Businessprocess performance mining with staged process flows. In: Proc. of CAiSE, Springer(2016)

    42. Verbeek, H., van der Aalst, W.M., Munoz-Gama, J.: Divide and conquer. Technicalreport, BPM Center Report Series (2016)

    168

  • Stage-based Business Process Mining 9

    43. Reijers, H.A., Mendling, J., Dijkman, R.M.: Human and automatic modulariza-tions of process models to enhance their comprehension. Inf. Syst. 36(5) (2011)881–897

    44. Modig, N., Ahlström, P.: This is lean: Resolving the efficiency paradox. Rheologica(2012)

    45. Hornix, P.T.: Performance analysis of business processes through process mining.Master’s thesis, Eindhoven University of Technology (2007)

    46. Gunther, C.W., Rozinat, A.: Disco: Discover your processes. In: Proc. of BPMDemos. Volume 940 of CEUR Workshop Proceedings. (2012) 40–44

    169

    conts_CAiSE2017_Forum-DC_Preface-TOCCAiSE2017_DC_Paper1