detection and quantification of flow ... - andrea.burattin.net · b andrea burattin...

Softw Syst ModelDOI 10.1007/s10270-017-0576-y

SPECIAL SECTION PAPER

Detection and quantification of flow consistency in businessprocess models

Andrea Burattin1 · Vered Bernstein3 · Manuel Neurauter1 · Pnina Soffer3 ·Barbara Weber1,2

Received: 15 October 2015 / Revised: 18 October 2016 / Accepted: 23 December 2016© The Author(s) 2017. This article is published with open access at Springerlink.com

Abstract Business process models abstract complex busi-ness processes by representing them as graphical models.Their layout, as determined by the modeler, may have aneffectwhen thesemodels are used.However, this effect is cur-rently not fully understood. In order to systematically studythis effect, a basic set of measurable key visual features isproposed, depicting the layout properties that are meaning-ful to the humanuser. The aimof this research is thus twofold:first, to empirically identify key visual features of businessprocessmodelswhich are perceived asmeaningful to the userand second, to show how such features can be quantified intocomputational metrics, which are applicable to business pro-cess models. We focus on one particular feature, consistencyof flow direction, and show the challenges that arise whentransforming it into a precise metric. We propose three dif-ferent metrics addressing these challenges, each following adifferent view of flow consistency. We then report the resultsof an empirical evaluation, which indicates which metric ismore effective in predicting the human perception of this fea-ture. Moreover, two other automatic evaluations describingthe performance and the computational capabilities of ourmetrics are reported as well.

Keywords Business process modeling · Metrics · Visuallayout · Qualitative empirical study · Consistency of flow

Communicated by Dr. Selmin Nurcan.

B Andrea [email protected]

1 University of Innsbruck, Innsbruck, Austria

2 Technical University of Denmark, Lyngby, Denmark

3 University of Haifa, Haifa, Israel

1 Introduction

Business process modeling is a broad and important areafor practice and for applied research. Process modelingrefers to the representation of organizational or business pro-cesses in a graphical manner, usually as a flow of activities[10]. It is common across industries—important for design-ing and improving business processes, analyzing industrygoals and outcomes—including organizational efficiency,revenues, and social impact [10]. For these purposes, thequality of the process model is of importance. Model qualityhas been classified to syntactic (“correctness” of a model),semantic (the extent to which the model captures the behav-ior of the domain), and pragmatic (usefulness) quality [14].When focusing on pragmatic quality, i.e., the correspondencebetween a model and people’s interpretation of it, importantaspects considered are the understandability and the read-ability of the model by a human user. Several efforts havebeen made, attempting to study and to improve user com-prehension of process models. For example, [20] lists anddescribes in detail seven process modeling guidelines, withthe aim of improving the understandability of the modelsdesigned according to such guidelines. Another work [29]proposes a framework which allows to improve the read-ability of large process models by abstracting not usefulelements and tailoring the visualization (both in terms ofappearance and format) to the needs of the end users. How-ever, a comprehensive set of features for the quantificationof users’ comprehension of process models is still miss-ing.

A large body of research has addressed factors that influ-ence model understandability and readability, relating bothto business process models and to other kinds of concep-tual models. Much attention in this respect has been givento semantic clarity of the modeling language [19,27] and to

123

http://crossmark.crossref.org/dialog/?doi=10.1007/s10270-017-0576-y&domain=pdf

http://orcid.org/0000-0002-0837-0183

A. Burattin et al.

the graphical elements of the modeling language [21,28].Additional factors identified relate to specific properties ofan individual model (e.g., complexity metrics [7,17,34,35]).Visual features of elements in a model [30,32] have beenstudied, specifically the effect of what is sometimes called“secondary notation” on model understandability [15,31].In contrast, the specific layout of a model has received lit-tle attention. To the best of our knowledge, only a fewstudies have investigated how layout features of a processmodel affect its understandability, providing very limitedcoverage. One specific example is [6], addressing the flowdirection of a process model and its effect on model under-standing. This investigation addressed only consistent flowdirections (e.g., left to right, top to bottom), not con-sidering the possibility of change in the direction of themodel.

Cognitive psychology research has shown that the appear-ance of a model in general has a significant effect on user’scomprehension.Because of that, approaches to automaticallylay out graphs while preserving the mental map have beenproposed [21].Moreover, results reported in [16] indicate thatpatterns involving graphical representations (see, for exam-ple, patterns P11 and P22 which received the highest scores)are effective in terms of both usefulness and ease to use.Therefore, the visual layout of a process model is centralto achieve its aims—effectively communicating the intendedprocess, ensuring comprehension by its users, and enablingrevision and improvement of the process model. Yet, we cur-rently lack an agreement upon set of concepts for describingand characterizing layout properties as perceived by humansand this lack hampers the ability to comprehensively studythe effect of model layout on its understandability and read-ability.

In the context of modeling and model-generating tools,model layout has received attention and has been character-ized by precisely defined properties. These tools typicallyinclude functionality that determines the layout and rear-ranges the elements in the model in order to improve itsreadability (e.g., [8,9,11]). The algorithms that are employedrelate to precisely defined properties, such as avoiding linecrossings in the model, alignment of model elements, usageof straight angles with the goal to produce a “neatly” appear-ing model. However, there is no indication regarding howcomprehensively these properties correspond to and capturethe human perception of the model. In fact, there is no cog-

1 P1 refers to Layout Guidance: “the availability of layout conventionsor advice to organize the various model elements on a canvas. Theseinclude indications on the orientation, alignment, and spacing of modelelements in the space.”2 P2 refers to Enclosure Highlight: “availability of modeling constructsto visually enclose a set of logically related model elements, and add acomment to characterize the group.”

nitive anchoring of the specific selection of features that arecurrently addressed.

Webelieve that a set of concepts describing layout featuresshould comprehensively correspond to how humans perceivethe layout of the model. At the same time, it should be pre-cisely defined and allow quantification and measurement oflayout properties, serving several purposes. First, it will per-mit a more focused study of the effect of process modellayout on model understanding. Such studies may addressspecific properties or combinations of properties. Second, itcan become a basis for guiding the creation of process mod-els. Currently, although there have been some broad efforts toguide modeling from a visual perspective—such as 7PMG[18]—process modelers individually decide how to designthe process model layout. Note that most of the modelingguidance that is available (e.g., in 7PMG) is semantic andstructural, while the visual perspective is only addressed toa very limited extent. Third, systematically developed lay-out guidelines may support the training of modelers. Whenapplied in an online setting, they can further serve as basisfor intelligent modeling environments that provide feedbackto modelers (on how to improve the model). Finally, theycan serve for the development of automatic layout featuresof modeling tools.

The aim of this paper is to take a step toward a set ofhuman-meaningful and measurable layout features of pro-cess models. This paper extends our previous work [1]where key visual features were elicited in an exploratorystudy. This study, as reported here, resulted in a set oflayout features reflecting human perception. Our main con-tribution in the current paper relates to the transformationof human-perceived layout features into quantitative met-rics. We focus on one specific feature (i.e., consistency offlow) to highlight the challenges of such transformation.Then we propose two different approaches operationalizedinto three metrics to address these challenges, each high-lighting a different aspect of the flow consistency. Finally,we report results of an empirical evaluation, which indi-cates the extent to which these metrics are consistent withthe human perception; results showing the models wherethesemetrics differ; and results regarding computational effi-ciency.

Accordingly, the remainder of the paper is structured asfollows: Sect. 2 presents the methodology and setting of theexploratory study; Sect. 3 reports the findings of this study asa list of identified layout features; Sect. 4 focuses on the con-sistency of flow feature and presents alternative approachesfor its quantification. Section 5 deals with the empirical eval-uation of the proposed approaches and discusses the findings,while Sect. 6 reviews related work. Section 7 provides theconclusions and highlights implications of this study forfuture research.

123

Detection and quantification of flow consistency in business process models

2 Exploratory study

The exploratory study was guided by the research questionof what layout features of process models are perceived asmeaningful bymodel users. Due to the nature of the question,which seeks to discover features rather than to corrobo-rate hypotheses, the study was exploratory in nature. Yet, toincrease validity we have also conducted a validation study,intended to validate the findings of the exploratory studywith a different subject population. To identify candidatevisual features of a process model, an empirical qualitativestudy was conducted. Qualitative data gathering was neededin order to get an understanding of a user’s point of view[4]. Acquiring knowledge from participants was essential tounderstand how they perceive the visual layout of businessprocess models.

2.1 Setting

The exploratory study consisted of two steps that tookplace at the Department of Information Systems at theUniversity of Haifa. It was later on complemented by avalidation study. Participants in the exploratory study were15 undergraduate (in the first step) and seven graduatestudents (in the second step). Participants all had someknowledge of business processes modeling. All participantscame with similar educational background—all took a vari-ety of information systems analysis courses and studiedmodeling languages. Participation in the study was volun-tary.

The study included questionnaires and interviews. Atthe first step, 15 undergraduate students were asked to fillout a questionnaire. Following an initial screening of theanswers that were obtained, the second step included inter-viewswith seven graduate students, intended to gain a deeperunderstanding than what could be gained based on the ques-tionnaires. The interviews were based on the questionnaires,but allowed for interaction and prompting deeper explana-tions. Thus, interviews were used in order to get a betterunderstanding of the user perspective on the visual layout ofbusiness process models [22].

2.2 Data collection process

Questionnaires The goal of the questionnaire3 was to elicitparticipants’ beliefs and perceptions of the visual layout fea-tures of process models. A pilot questionnaire was givenahead of time to three participants in order to simplify andimprove the questions in the questionnaire. The questionnaire

3 All the material used for this study is available at http://bpm.q-e.at/experiment_flow_consistency.

had five pairs of BPMN models whose visual similaritiesand differences were to be elicited. The models were madesmall to fit a single page, yet it was ensured that their struc-ture was clearly visible. All activity labels or edge labelswere blurred in order to have participants address the visualaspect of the model exclusively rather than “read into it.”Some of the pairs included two models which appeared tobe visually different according to the judgment of one ofthe researchers, while others appeared visually similar. Inother words, the pairs were selected to have different lev-els of similarity according to the researcher’s judgment. Themodels were all presented in black and white in order to haveparticipants focus on layout features of the models. Colormight have drawn much attention and blur the effect of lessdominant features [22]. An example pair from the question-naire is shown in Fig. 1. The questionnaire consisted of thesame set of questions referring to each of the pairs. Thisset included one question asking the participants to rate thevisual similarity of the models on a 7-point Likert scale. Thegoal of using a Likert scale was to prompt participants toactually look at the figures, compare their layout, and eval-uate their similarity. Following the Likert scale evaluation,two open-ended questions were presented to the participants,asking them to indicate differences and similarities betweenthe models (at least three of each). Only the answers of theopen-ended questions were considered in the data analy-sis; the Likert scale evaluation was only intended to promptthe comparison of the models and was not analyzed after-ward.

Interviews As a second step of the study, the interviews wereintended to gain a more comprehensive view of layout fea-tures that influence the perception of visual appearance ofmodels. To this end, the interviews were semi-structured,based on the questionnaires. First, participants were askedto fill the questionnaire. Next, the participants were askedspecific questions about their answers, prompting addi-tional explanations about specific differences or similaritiesbetween the models of each pair. In particular, partici-pants were asked to explain their answers and justify themby indicating relevant parts of the models. They wereasked to express their preferences regarding the modelsand the specific features. They were also asked to com-pare the models in terms of clarity and to indicate whatimprovements could be made to increase the clarity oftheir appearance. The interviews were recorded and tran-scribed.

3 Analysis and findings

The data collected from the questionnaires and interviewswere qualitatively coded and classified into categories.

123

http://bpm.q-e.at/experiment_flow_consistency


A. Burattin et al.

(a)

(b)

Fig. 1 Example pair of models from the questionnaire. In this case, labels should be ignored since, on the questionnaire, they were blurred to justfocus on the visual aspect

123


3.1 Analysis

We started by analyzing the interview transcriptions, sincethey were more informative than the written answers of thequestionnaires. The text was broken into segments, eachreferring to a single layout feature. Using qualitative dataanalysis methods [33], textual segments were coded by themodel(s) they referred to and classified to categories in a pro-cess of open coding. Saturation of the categories was reachedby the fourth interview, so eventually no new categories werefound with additional data analysis. We nevertheless ana-lyzed all the collected data in a similar manner, includingthe written answers of the questionnaires obtained at the firststep of the study. Open coding was followed by an axial cod-ing, where categories were related to each other and groupedinto more general categories to form a hierarchy. In our case,grouping related to the model’s elements (e.g., edges) andproperties (e.g., structure).

3.2 Categories

Table 1 summarizes the categories found by qualitative anal-ysis. It provides a definition for each category, and examplesof supporting statements taken from the questionnaires andinterviews. The lower-level categories are grouped under theidentified higher-level ones (where applicable). All the cate-gories in the table are supported by at least two statements.

3.3 Study validation

Although saturation was reached during the data analysis ofthe exploratory study, the relatively uniform population par-ticipating in the study posed two main threats to the validityof its findings. First, all participants had a similar educationalbackground and little practical experience. Second, theywereall Hebrew speakers, trained in reading left to right as wellas right to left, and this is not typical of other populations. Toovercome these limitations, we performed a separate valida-tion study, interviewing modeling experts and practitionersfrom a more diverse geographical distribution. The valida-tion study included seven modeling experts and practitionersfrom Israel, Austria, Italy, and Estonia. Each one was inter-viewed using the same questionnaire and procedure as inthe second step of the exploratory study. When analyzingthe data, rather than repeating the full qualitative analysisprocess, we attempted to map the textual segments to thecategories already found in the exploratory study. Only if astatement could not be mapped to one of these categories,it was marked for further analysis. Eventually, we analyzedand classified the (few) unmapped statements.

The results support all the categories identified in theexploratory study, each with at least two indications by dif-ferent participants. In addition, we got two other categories

supported by at least two participants each and consideredone of them as a relevant addition to our list:

• The first category is fixed size of activity boxes, whichrefers to the possibility of having different sizes of theactivity boxes for short and long textual descriptions ofthe activities. Interestingly, the activity boxes in the mod-els were all of fixed size. Yet, the experienced modelersrecognized the possibility of varying the box size. As oneof the interviewees said: “in both models the sizes of theactivity boxes are fixed. When I create models I some-times change the box size to fit longer descriptions. It isstrange they didn’t do so.”

• The other category deals with implicit versus explicitgateways, which represents a known property associatedwith the pragmatic quality of BPMN models. However,since this feature is specific to the syntax of BPMN wedecided to leave it out of our list.

In summary, the results of the validation study fully sup-port the results of the exploratory study and extend it withone additional feature.

3.4 Threats to validity

Our study is subject to several threats regarding externalvalidity. According to themethodological guidelines of qual-itative text analysis, a clear indication that sufficient datahave been collected is when the theoretical saturation pointis reached (i.e., no newcategories are revealedwith additionaldata) [2]. According to [2], when saturation is reached, noadditional data collection is needed. In our study, theoreticalsaturationwas reached at the fourth interview.Yet,we contin-ued analyzingdata beyond the saturationpoint andperformedan additional validation study with a different population toensure the validity of the findings.

Another potential threat relates to the homogeneity of oursubject group in terms of their interest in process models. Inthe context of our study, however, the fact that all our sub-jects are familiar with BPMN is a desirable feature: Processmodels are not general graphs since they have a very specificsemantic associated with the elements. Therefore, only peo-ple familiar with this formalism are able to grasp the peculiarcharacteristics of the representation. The lack of homogene-ity, however, also implies that no generalizations can bemaderegarding the formalism.

Another potential threat relates to the fact that the sub-jects of the exploratory study were students. Here we wouldlike to argue that for this particular study, this does not rep-resent a major issue, since our questionnaires do not relyon experience [13]. Moreover, this risk was mitigated by theuse of experts and practitioners in the validation study, whoseresults supported the ones of the exploratory study. In order to

123

A. Burattin et al.

Table 1 Identified featurecategories

Feature Description Example reference from data

Edges

Length of edges The length of the edges in themodel. A model may varyconsisting very short edges(creating a dense model) to verylong edges (creating a widelyspread model), or a mixture oflengths

“The model on the right doesn’tseem right since there are manylong edges throughout themodel”

Edges style:straight, curved,or with bendingpoints

Edges can be straight or curved, orthey may consist of one or morebending points, which divide theedge into two segments or more

“Need to straighten all the brokenedges”

Crossing edges Edges that cross each otherintersect with other edges.Intersecting edges might createconfusion when following theflow of the model

“There are edges here that just goone on top of the other,” “Thislooks like a spider web”

Text on edges Existence and amount of textannotations on edges. The textcan either be descriptive orconditional

“When something is written on theedge, it is difficult to understandwhich edge it refers to”

Number of ending points The total number of ending pointsin the model. An ending point isan end event or an element withno outgoing edges

“There are many ending points,”“One ending point connected tomany edges, appears like a loop”

Angles The angles used in bending pointsof edges: 90◦ angles, angleslarger than 45◦, angles smallerthan 45◦

“I would improve the angles in thismodel to be 90◦ angles,” “Changethe edges to be straight lines”

Model’s structure

Model’s shape The general shape of the modelrefers to the way the model isspread on the canvas. Thisusually is characterized as asquare or rectangle

“The structure in both models ishorizontal”

Model’s area The area taken by the model on thecanvas

“The size of the models isdifferent”

Model’s direction

General direction The general direction/flow of themodel. The direction of themodel can be characterized asvertical or horizontal: Left–rightdirection, right–left direction,top-to-bottom direction,bottom-to-top direction

“This model goes in a cleardirection,” “Both models arevertical”

Placement of ending event The location of ending points inthe model in relation to thestarting point of the model

“Location of the ending pointmakes it clear where the processends”

Branching off Branching off of the model fromone main path to more than one,where each branch flows in adifferent direction

“I don’t like to wonder where anedge leads to”

Consistency of flow The flow of the model can be inone definite direction from thebeginning till the end of themodel. Alternatively, it can beunclear or changing throughoutthe model to different directions

“There is a change in the directionof the model,” “Both models arebuilt stepwise”

123


Table 1 continuedFeature Description Example reference from data

Symmetry in blocks Referring to structured blocks inthe model—symmetry ofelements arrangement across theblock

“This block in the model is verysymmetrical and therefore veryunderstandable”

Alignment in the model Alignment of the elements in themodel in relation to each other

“This model is clearer because ofthe alignment of the wholemodel. It is very aesthetic”

further mitigate this risk, we cross-checked our results withthe literature on BPM, graphs theory, and conceptual model-ing. As reported, this literature research showed that all thefeatures addressed by existing studies were included in ourset, while at the same time our set also contains additionalfeatures.

4 Quantification of flow consistency

In Table 1, the different visual features identified in theexploratory study are highlighted. In order to enable the sys-tematic investigation of how these features impact on modelcreation and use, it is necessary to transform them into met-rics that can be automatically computed. In our previouspaper [1], we already reported details about some of thesemetrics. In the following, we will focus on one of the iden-tified features, namely consistency of flow. In general, wecan state that the consistency of flow measures the extent towhich the layout of a process model reflects the temporallogical ordering of the process. This metric, not analyzed indetail in our previous work, is particularly challenging sinceit involves “high-level concepts” and how such concepts arerepresented. Moreover, there are several different ways ofcomputing it, and it is not obvious which approach wouldmost closely reflect human perception.

For example, let us consider themodels in Fig. 2. Figure 2adepicts a model which structures the flow in one, horizontal,left-to-right direction. The model in Fig. 2b, in turn, containsthree “horizontal lines,” each of them with a clear direction.In this case, in order to read the complete process, it is neces-sary to change the reading direction between each line: Thereader has to “go back” with the eyes to the left side beforecontinuing to the right again. Therefore, the flow directionof this process is less consistent compared to Fig. 2a. For themodel in Fig. 2c, it is even more difficult to identify a clearflow direction.

For most of the visual features described in Table 1, aclear mathematical quantification is mentioned in our previ-ous work [1]. The description of the consistency of the flow,however, is just sketched. In particular, several options existon how to approach the quantification. Metrics for the auto-

matic identification of changes in the flow direction can bebased on global or local features. Global features look at theoverall shape of the model and allow to detect the flow con-sistency based on the “global shape” of the process. Localfeatures, in turn, consider how activities (i.e., vertices ofthe graph representation of the process) and sequences (i.e.,edges of the graph representation of the process) are posi-tioned in relation to each other. Concerning global features,human beings can easily detect that the model illustratedin Fig 2a consists of one horizontal line, while the modelshown in Fig. 2b is spread over several lines. For algorithms,however, the identification of such patterns is rather difficult,while local features can be much better dealt with. Since ourgoal is to provide algorithmic solutions, we decided to fol-low the second approach, based on local features. To thisextent, and since we also want to closely reflect the humanperception, we devised three different metrics. The first twometrics calculate the direction of each edge; determine themost frequent direction, based on majority voting; and then,based on this predominant direction, compute the extent towhich the edges of the model are consistent with this direc-tion. The third metric, instead, looks at pairs of activities anddetermines whether their positioning reflects their temporallocal ordering.

4.1 Graphical representation of processes

In order to present our metrics, we define the graphi-cal representation of a process model as a diagram G =(V, E, LV , LE ). G is a tuple composed of the set of verticesV and the set of directed edges E ⊆ V × V . Each vertex4

and each edge has a corresponding graphical representation.Therefore, different from typical graph characterizationsavailable in the literature [3], we added two more relationsLV and LE , with information regarding the positioning of therespective elements on themodeling canvas (i.e., coordinateson the Cartesian plane). For vertices, we consider the centralpoint of its graphical representation LV : V → (N × N).

4 In our definition, vertices represent all the graphical nodes that wemight observe in a process model, not only activities. For example, con-sidering BPMN processes, we should also consider events, gateways,etc.

123

A. Burattin et al.

(a)

(b)

(c)

Fig. 2 Examples of models with different layouts, obtained starting from the same process description. a Process model with a consistent directionof the flow. bModel with some violations of the flow consistency. c Model without a strong flow consistency

123


(sx, sy)

(ex, ey)

e1

(a)

(sx, sy)

(ex, ey)

e2

(b)

(sx, sy)

(ex, ey)

e3

(c)

Fig. 3 Three differently shaped edges e1, e2, and e3 that, using our abstraction framework, are equally represented, since it considers only theirstarting and ending coordinates. In this case, the representation is given by the pair of points ((sx , sy), (ex , ey))

Fig. 4 Snippets of threecommon representations ofedges outgoing from a BPMNgateway to activities. Edges ofeach snippet are laid outaccording to the edge shapesreported in Fig. 3

A

B

A

B

(a)

A

B

(b)

A

B

A

B

(c)

For edges, in turn, we consider two coordinates, one repre-senting the starting and one the ending points. LE : E →(N×N)×(N×N). Note that this edge representation abstractsfrom the actual path of the edge (cf. Fig. 3, where the threeedges have exactly the same representation in our framework)and is therefore able to deal with edges whose layout is typ-ical of business process models. For example, Fig. 4 depictsthree possible and common ways of representing edges exit-ing from a gateway. The three different representations arebased on the edge styles reported in Fig. 3 and can also beobserved in the processes depicted in Figs. 1 and 2.

4.2 Metrics based on edges’ directions

In this section,we introduce a function,Cons(G), whichwillbe operationalized into two different metrics. Both metricsquantify the consistency of flow, i.e., the extent to which thelayout of a process model reflects the temporal logical order-ing of the process. To determine the extent of consistency,these metrics primarily focus on the edges of the processmodel and quantify the consistency of all the edges E ingraph G.

To outline the idea behind these metrics, consider the pro-cess model depicted in Fig. 5, which shows a typical layoutfor a business process model. The fundamental idea is todetermine, in a first step, the predominant flow direction ofthe process graph G. When looking at Fig. 5, we can easilysee that the diagram points east, i.e., the predominant flowdirection is east. Then in a second step, the metrics check the

consistency of all the edges of graphG with the predominantflow direction. Analyzing each edge, we can see that e1, e2,e7, and e8 clearly point east, i.e., they are consistent with thepredominant flow direction. For edges e3, e4, e5, and e6, itis less obvious since these edges cannot be assigned to oneclear direction easily. Following a naïve approach, we couldjust consider themost predominant direction of the edge (likewe did for the entire process graph):We may classify edgese3 and e6 as pointing north (the edges look slightly morenorth than east). Edges e4 and e5, in turn, would be classi-fied as pointing south (the edges look slightly more souththan east). The consistency of flow could then be calculatedby dividing the number of edges that are consistent with thepredominant flow direction (i.e., all edges pointing east), bythe overall number of edges resulting in a consistency scoreof 4/8 = 0.5. The assignment of just one direction to an edgewould result, against our intuition, in a relatively low consis-tency of flow. Assigning two directions to each edge, insteadof one, allows to better reflect our intuition of consistencyof flow. Edges e1, e2, e7, and e8 would still be classifiedas east. Edges e3 and e6, however, would be classified asboth north–east, and edges e4 and e5 would be classifiedas south–east. To calculate the consistency of flow, we cannow consider all edges that have one direction assigned thatis consistent with the flow (i.e., all edges pointing towardnorth–east or south–east would be considered correct) anddivide them by the overall number of edges. In this case,and in line with our intuition, we would obtain a consistencyscore of 8/8 = 1.

123

A. Burattin et al.

Fig. 5 Illustrating example formetrics based on edges’direction

e3

B

C

A D

EndStart

e4

e5

e6

e7 e8e2e1

In the following, we describe more formally how met-ric Cons(G) captures our intuition of how the consistencyof flow should be calculated. The proposed metric assignsto each graph G a value between 0 and 1 quantifyingthe degree of consistency. For this, we assume that thegraph G has a predominant flow direction and we haveto determine it. The set of all possible directions is D ={North,East,South,West}, whereby the selection of themost predominant flow direction is based on majority vot-ing (i.e., the direction most edges belong to is considered asthe predominant flow direction). Precondition for the major-ity voting is that we can identify the direction of each edge.The function Direction : LE → P(D)5, given the layoutinformation of an edge, returns the set of directions (i.e.,potentially more than one) the edge belongs to. In Sect. 4.2.1,we present the naïve approach to compute Direction, onlyconsidering one direction (which will serve as a baseline).In Sect. 4.2.2, in turn, we describe the classification ofedges into two directions. We can then calculate the over-all consistency Cons(G) by dividing the number of edgesbelonging to the predominant flowdirection by the number ofedges.

Algorithm 1 highlights the calculation of the metric. Theprocedure requires a graphG as input (with the layout details)and a Direction function, in order to get the directions ofan edge. The procedure then iterates through each edge(line 5) and adds its directions to the proper direction counters(line 9). Frequency of the predominant direction is computed(line 13), and the final result is returned (line 14).

The two upcoming subsections explain in detail the twopossible instantiations of the Direction function. The firstassigns to each edge exactly one direction; the second assignstwo directions to each edge. Since these two definitionsof Direction affect the final result of the Cons function,we assign to the two possible combination of Cons andDirection different names: M-E1 and M-E2.

5 With P(S), we indicate the powerset (i.e., the set of all subsets) of S.

Algorithm 1: Algorithmic specification of Cons(G).Input: G = (V, E, LV , LE ): graph with the representation of

the process; Direction: a function to obtain thedirection(s) of an edge

/* Define the directions, and initializeone counter for each direction */

1 freqs[North] ← 02 freqs[East] ← 03 freqs[South] ← 04 freqs[West] ← 0

/* Iterate through all edges to populatefreqs */

5 for e ∈ E do/* Contribution of the edge to each

direction */6 dirse ← Direction(e) /* dirs is a set with all

directions the edge e is pointing to*/

7 for d ∈ {North,East,South,West} do/* If the direction d is one of edge’s

direction, then increment thecorresponding counter */

8 if d ∈ dirse then9 freqs[d] ← freqs[d] + 1 /* The same edge is

allowed to belong to more thanone direction */

10 end11 end12 end

/* Obtain the cost of the predominantdirection */

13 predominant ←max {freqs[North], freqs[East], freqs[South], freqs[West]}/* Return the final consistency score,

assuming the graph has at least one edge(and therefore |E | > 0 */

14 return predominant/|E |

4.2.1 One direction per edge (M-E1)

Thefirst instanceof theDirection functionweanalyze,whichmight be considered as a naïve approach, assigns one direc-tion to each edge. For readability purposes, in the rest of this

123


0◦

−45◦

−90◦

−135◦

−180◦

180◦

135◦

90◦

45◦

0◦

−45◦

−90◦

−135◦

−180◦

180◦

135◦

90◦

45◦

(a) (b)

Fig. 6 Division of the radius according to the number of directions peredge that could be defined. These two cases report North (gray filledarea), East (dotted area), South (grid area), and West (lined area)directions. a Division of the radius to assign one direction to each edge

(M-E1). Each direction does not overlap with any other. b Division ofthe radius to assign two directions to each edge (M-E2). Each directionoverlaps with the two adjacent ones

paper, we refer to the Cons function using the Directiondescribed in this section as M-E1.

In order to identify the directions of each edge,we considerthe “angle” created by the edge. Starting from the coordinatesof the start and end points of an edge and using the arctangentfunction with two arguments, we can get the actual angle ofthe edge. To determine the direction of the edge,we divide theradius into four equal parts of 90◦ (one for each direction, i.e.,North,East,South,West). Figure 6a highlights the fourdirections: The filled area identifies the North direction; thedotted area identifies East; the grid area represents South;and the lined area identifies the West direction.

We then check whether the angle obtained for a particularedge is within the intervals referring to each direction. Sincethe angles corresponding to each direction do not overlap,Direction always assigns exactly one direction to each edge.

Examples

In the following, we illustrate the calculation of the M-E1metric using the examples shown in Fig. 2. Table 2 summa-rizes the obtained results. For example, if we consider theprocess of Fig. 2a, we can see that it contains 1 edge look-ing North, 48 looking East, 2 looking West, and 0 lookingSouth. As reported in line 14 ofAlg. 1, the final score is givenby predominant/|E |, where predominant is the frequency ofthe prevalent direction (East in our case, with frequency 48)and |E | is the number of edges of the graph. Therefore, ifwe apply these counters, the final consistency score for thismodel is 48/51 = 0.941.

In Fig. 2b, we have 1 edge looking North, 50 lookingEast, 4 looking West, and 4 looking South. Therefore, thefinal consistency score is 50/59 = 0.847.

Table 2 Consistency scores computed using the M-E1 metric

Model Consistency using M-E1

Figure 2a 0.941

Figure 2b 0.847

Figure 2c 0.392

Finally, in Fig. 2cwe have 5 edges lookingNorth, 20 look-ingEast, 17 lookingWest, and 9 lookingSouth. Therefore,the final consistency score is 20/51 = 0.392.

4.2.2 Two directions per edge (M-E2)

In this subsection, we are going to report details regardingthe second definition of the Direction function. This newversion of the function assigns two directions to each edge.For readability purposes, in the rest of this paper, we referto the Cons function using the Direction described in thissection as M-E2.

Themain differencewith respect to the definition providedbefore is that now each direction corresponds to a 180◦ por-tion of the angle. Based on this definition of direction, eachportion overlapswith two others (cf. Fig. 6b). In this case, it ispossible to see that the East direction (dotted area) overlapswith both the North (filled area) and the South (grid area)directions. The result is that, with this metric, each edge isalways associated with exactly two directions.

Examples

Despite this slight change, the overall metric could generatevery different values. For example, if we consider again the

123

A. Burattin et al.

Table 3 Consistency scorescomputed using the M-E2metric

Model Consistencyusing M-E2

Figure 2a 0.960

Figure 2b 0.915

Figure 2c 0.588

process models seen so far, we can observe the score valuesreported in Table 3.

The process in Fig. 2a contains 28 edges looking North,23 pointing South, 49 pointing East, and 2 looking West,with the predominant flow direction East. Therefore, if wecompute the same values ofAlg. 1, the final consistency scorefor this model is 49/51 = 0.960.

In Fig. 2b, we have 28 edges looking North, 31 pointingSouth, 54 pointing East, and 5 looking West, again withthe predominant flow direction East. Therefore, the finalconsistency score is 54/59 = 0.915.

Finally, in Fig. 2c we have 21 edges looking North, 30pointing South, 27 pointing East, and 24 looking West.Therefore, the final consistency score is 30/51 = 0.588.Unlike with previous examples for this model, the predomi-nant flow direction is South.

Please note that the time complexity of both theDirectionprocedures reported in the last two subsections is constant,given an edge. Therefore, the general complexity of theConsfunction is linear on the number of edges of the graph.Another key characteristic of these metrics is their seman-tic independence, i.e., they can be applied to any directedgraph.

4.3 Metric based on behavioral profiles (M-BP)

While the two metrics introduced in Sect. 4.2 consider theedges of a process model, this metric puts its focus on activi-ties to determine the extent towhich the layout of themodel isconsistent with the temporal logical ordering of the businessprocess. For this, the metric looks at the relations betweenpairs of activities (i.e., their behavioral profiles [37]) andevaluates, for each of them, whether the way they are placedin relation to each other is consistent with their temporal logi-cal ordering. For readability purposes, in the rest of this paper,we refer to the metric described in this section as M-BP.

The fundamental idea behind behavioral profiles consistsof the characterization of a process using relations betweentwo activities, defined according to three fundamental possi-bilities: (i) strict order; (ii) exclusiveness; or (iii) interleavingorder. Let us present these relations using the example pro-cessmodel depicted in Fig. 7. Between activities A and B, thestrict order is holding, identified as A → B (i.e., A alwaysoccurs before B, and never the other way round). Activi-ties C1 and C2 are in as exclusiveness relation C1 + C2

(i.e., C1 cannot appear before C2 and C2 cannot appearbefore C1). Finally, E1 and E2 (but also E3) are in inter-leaving order: E1‖E2 (i.e., E1 might appear before E2 andE2 might appear before E1 as well).

The main idea of the metricM-BP is to measure the extentto which the layout of a process model reflects the temporallogical ordering of the activities. The behavioral relation thatimposes a restrictive order among activities is the strict orderbehavioral relation. Therefore, we need to analyze the posi-tion of nodes referring to activities belonging to such relation.Then, for each strict order relation, we check whether thesource node (i.e., the node that must occur first) is “graphi-cally before” the target node (i.e., the node that must occurlater). The “graphically before” condition holds if the targetnode is placed east or south of the source node, i.e., the posi-tioning of the two activities reflects their temporal logicalordering. To calculate the consistency score, we divide thenumber of graphically before relations by the overall numberof strict relations.

More formally, given aprocess graphG = (V, E, LV , LE ),let us define a behavioral relationship b as the tuple b =〈r, s, t〉 with r ∈ {→,+, ‖} indicating the relation type ands, t ∈ V indicating the nodes associated with the source andthe target activities (therefore, s and t must refer to nodesrepresenting activities, not just “general nodes” of G). Forconvenience, we define projection operators for a behavioralrelation instance b = 〈r, s, t〉 such that #relation(b) = r ,#source(b) = s, and #target(b) = t . We also assume to haveavailable a procedure BehavioralProfiles which extracts allbehavioral relations out of a process.6 The complete pseu-docode of the algorithm is reported in Algorithm 2.The algorithm starts by initializing several counters and byextracting all the behavioral profiles available in the pro-cess (line 4). Then, it has to iterate through all relations andconsider only the strict orders (line 6). For these relations,the coordinates of source and start activities are extracted(lines 7–8) and the system checks whether the graphicallybefore condition holds (lines 9–14). The final score is com-puted as the ratio of the graphically before relations to thetotal number of strict relations (line 18).

The complexity of the algorithm is linear on the number ofbehavioral relations that could be extracted. Behavioral pro-files can be computed quite efficiently, in a low polynomialtime to the size of the process model [36].

4.3.1 Examples

Ifwe consider the example processmodels seen so far,we canobserve the score values reported in Table 4. For example,the process in Fig. 2a has 43 strict relations (computed with

6 In this work, we assume to compute the behavioral profiles using lookahead value 1.

123


Start

A B

C2

C1

D E2

E3

E1

F

End

Fig. 7 Example of process model for illustrating behavioral profiles between activities

Algorithm 2: Specification of metric for the compu-tation of the layout consistency using the behavioralrelations.Input: G = (V, E, LV , LE ):

1 tstrict ← 02 correctEast ← 03 correctSouth ← 04 BP ← BehavioralProfiles(G) /* Compute all

behavioral relations */5 foreach bp ∈ BP do6 if #relation(bp) =→ then /* Only strict order

relations */

/* Extract the coordinates of thecentral points of the source andtarget nodes */

7 (sx , sy) ← LV (#source(bp))8 (tx , ty) ← LV (#target(bp))

9 if sx < tx then /* Check for the Eastdirection */

10 correctEast ← correctEast + 111 end12 if sy < ty then /* Check for the South

direction */13 correctSouth ← correctSouth + 114 end

15 tstrict ← tstrict + 116 end17 end18 return max {correctEast, correctSouth} /tstrict /* Final

consistency score as the dominantdirection, divided by the total numberof strict relations */

look ahead 1) and 40 of them are pointing South − East.Therefore, the final score is 40/43 = 0.930. In Fig. 2b, wehave 38 strict relations (with look ahead 1) and 33 of them arepointingSouth−East. Therefore, the final score is 33/38 =0.868. Finally, in Fig. 2c we have 37 strict relations (withlook ahead 1) and 23 of them are pointing South − East.Therefore, the final score is 23/37 = 0.622.

Please note that, since we are using look ahead equal to 1,we do not penalize the violation of the “graphically before”condition for relations involving activities very far apart in the

Table 4 Consistency scorescomputed using the M-BPmetric

Model Consistencyusing M-BP

Figure 2a 0.930

Figure 2b 0.868

Figure 2c 0.622

process. In particular, value 1 for the look ahead indicates thatonly pairs of very close activities are considered. Althoughthis is a parameter of our approach (i.e., it can be changed),we opted for this configuration since each local violationimplies a change in direction and, this way, we count thechanges without penalizing more than once for each change.

5 Evaluation of flow consistency

To gain a better understanding of our new flow consistencymetrics, we conducted several empirical evaluations. In total,we performed 3 evaluations, useful to provide a more com-prehensive picture of the capabilities of our metrics.

Since the approaches we defined rely on different assump-tions, and therefore quantify the flow consistency usingdifferent feature sets, with the first evaluation we wantedto establish to what extent the metrics “agree” on the samemodel. We can use this analysis to ensure that the featureswe are taking into account are not redundant to each otherand therefore be sure that our approaches are measuring theconsistency of the flow considering different perspectives. Ifthis is the case, we could identify the conditions under whicha metric performs better than another one. Another evalua-tion focused only on the time efficiency of our metrics, i.e.,the time required to compute each metric on all the modelsof our dataset. The third evaluation is an experimental one,performed with the support of several people familiar withprocess modeling, and aims at comparing the human assess-ment of flow consistency with the outcomes provided by ourmetrics.

123

A. Burattin et al.

Both automated analyses are based on a process modeldataset which was generated during a modeling session con-ducted in December 2012 at the Eindhoven University ofTechnology, with students following programs on operationsmanagement and logistics, business information systems,innovation management, and human–technology interaction[23]. In total, the dataset contains 125 models, all referringto the same process description. The experimental evaluationis based on a subset of this dataset containing 14 mod-els.

5.1 Metrics agreement

The first analysis we performed aimed at establishing theextent to which different metrics agree. In order to evaluatethis aspect,we counted the number ofmodels that eachmetricplaces within a provided consistency score interval. Figure 8depicts the distribution of the values, using intervals of 0.1.It is interesting to note that the metrics tend to assign scoresin slightly different intervals. For example, M-E1 distributesthe scores rather uniformly, but there is a considerable setof models (above the average) with scores in the interval0.6–0.9. Metric M-BP, in turn, assigns rather high values,with only very few exceptions below 0.6. Finally, metric M-E2 assigns even higher scores, and most models lie in theinterval 0.8–1.

In order to compare the agreement of our metrics, wedecided to use ranking. Specifically, using the scores gen-erated by each metric we ranked the dataset, from the mostconsistent model to the least consistent one. Therefore, foreach model, we ended up with three ranking positions (onefor each metric). We computed the average and the stan-dard deviation of the rankings for each model, and we plotthe latter value. Figure 9 contains the results of our anal-ysis. The figure does not only show the evolution of thestandard deviation as the average ranking increases, but also

reports the average values, computed every 10 data points.By looking at the average values of the standard devia-tion, it is interesting to highlight that our metrics tend toagree with the ranking next to the extremes of the chart(i.e., lower standard deviation averages for very high orvery low rankings). Instead, on the middle ranking positions,values are more spread apart: The peak of the average stan-dard deviation is reached for ranking positions 52–58. Thisclearly indicates that all metrics, basically, agree on “veryconsistent” and “very inconsistent” models. However, foraverage process quality scenarios, there is less agreement.This behavior can be identified also on the chart reported inFig. 10.

The observed behavior is in line with our expectationssince very consistent and very inconsistent models are “glob-ally” recognized, no matter what the considered features are.This is highlighted by our observation: Our metrics (and thefeatures taken into account) capture very consistent or veryinconsistent scenarios similarly (i.e., low standard deviationson ranking). Moreover, for average situations (i.e., averagerankings), the characteristics of eachmetric play an importantrole: The different feature sets used indeed provide differentcharacterizationswhich, in turn, results in valuesmore spreadapart. These large standard deviations, on average cases, alsoindicate that the features of the metric described in the pre-vious sections (i.e., majority voting with edge angles of 90◦and 180◦, and directions with respect to behavioral profiles)are not redundant to each other.

Consider as example the linear process model illustratedin Fig. 2a. It is ranked in position 3, 5, and 24 (respectively, byM-E1, M-E2, and M-BP metrics), with a rather low standarddeviation of 9.4. This is an example of amodel that allmetricsconsider as a consistent one. It is placed on the very left sideof the chart in Fig. 9.

The model reported in Fig. 2c has an even lower standarddeviation (1.63) with the following rankings: 118 for M-E1,

0

10

20

30

40

50

60

0 - 0.1 0.1 - 0.2 0.2 - 0.3 0.3 - 0.4 0.4 - 0.5 0.5 - 0.6 0.6 - 0.7 0.7 - 0.8 0.8 - 0.9 0.9 - 1

Num

ber o

f Mod

els

Consistency Score Intervals

M-E1 M-E2 M-BP

Fig. 8 Number of process models within different consistency score intervals

123


0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80 90 100 110 120 130

Stan

dard

Dev

ia�o

n of

Ran

king

Average Posi�on on Ranking

Avrage Std Dev (every 10 posi�ons) Standard devia�on for the given process

Fig. 9 In this chart, each dot represents a process model. The position on the x-axis indicates its average ranking; the position on the y-axisindicates the standard deviation of the rankings. Both averages and standard deviations are computed for all three metrics

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80 90 100 110 120

Stan

dard

Dev

ia�o

n of

Ran

king

Ranking Posi�ons Intervals

Min and max values for the interval Average std dev for the interval

Fig. 10 In this chart, each line represents the average standard deviation for the interval considered (instead, the chart in Fig. 9 reports the averageevery 10 positions). The light-blue area indicates how values are distributed for the interval under consideration (i.e., min and max values)

122 for M-E2, and 120 for M-BP. This indicates that allmetrics consider this an inconsistent model. It is located onthe very right side of the chart in Fig. 9.

Themodel depicted in Fig. 2b is not consistently ranked byallmetrics (positions 11 forM-E1, 36 forM-E2, and54 forM-BP). This model has a standard deviation of 17.6 and appearstoward the central part of Fig. 9. This model clearly high-lights the features that each metric takes into account: edges’direction and behavioral profiles violations. In particular,considering the directions of edges, this model, comparedto other models, has only few inconsistencies (such as thoseconnecting each horizontal fragment) andmany properly ori-ented edges. From a behavioral profiles perspective, instead,the model has a considerable amount of strict relations thatare violating the “graphically before” condition (again, com-pared to the other models).

To sum up, the metrics we described in this paper are pro-viding consistent results regarding “extreme models” (i.e.,

models with very high and very low consistency scores).Instead, on average scenarios, it depends onwhat the end userwants to analyze: Metric M-BP is concentrating on the posi-tion of activities; M-E1 and M-E2 are based on the directionof edges. In Sect. 5.3, we will investigate which representa-tion is more in line with the human perception.

5.2 Efficiency

All our metrics have been implemented as extension of theCheetah Experimental Platform [24]. In order to evaluate theefficiency of these metrics, we compared the time requiredto compute them for each model. The machine we used isa standard-level laptop with a Intel Core i7-2620M CPU(2.70GHz), equipped with 8GB of RAM and a 64-bit Win-dows 7 OS. The test was performed on the Java VirtualMachine version 1.8.0.25, with 64 bits.

123

A. Burattin et al.

Table 5 Time required to compute the metrics for one process model

M-E1 M-E2 M-BP

Average (ms) 0.1533 0.0693 34.4179

Max (ms) 2.0011 0.8164 174.4437

Min (ms) 0.0524 0.0161 2.4495

Final results are reported in Table 5 (all values areexpressed in milliseconds). The results report the average,the minimum and the maximum computation time requiredto calculate the metrics for the whole dataset. To providemore general results and to avoid that specific conditionsinfluence the computation, we computed each metric 5 timesfor each process (i.e., 5× 125 = 625 computations for eachmetric) and the average times are reported. As the complex-ity analyses suggested, all three metrics can be considered asvery efficient. From all three metrics, the behavioral profiles-based metric, M-BP, is the most demanding one, since ithas to compute the behavioral profiles in advance. Still themetric can be calculated, on average, within 34ms. Theseresults clearly show the applicability of the proposed met-rics even for settings with high performance requirements(e.g., online and real-time computations needed, for exam-ple, to include such calculations in an intelligent modelingenvironment that provides recommendations to users basedon observed behavior).

5.3 Experimental evaluation

The evaluation reported in Sect. 5.1 shows that the three pro-posed metrics tend to agree on the assessment of processmodels with very high and very low consistency of flow. Formodels with average ratings, the assessment is less consis-tent. In this section, we aim to evaluate how far the proposedmetrics reflect human perception of consistency of flow. Todo that, we conducted an empirical study in which we askedhuman readers to rate a set of models regarding their flowconsistency. We then compared the human assessment withour three metrics and looked for correlations. This way wecould tell which metric is a better approximation and to whatextent.

Subjects For our evaluations, we targeted subjects familiarwith process modeling. To this end, we decided to target theparticipants of theBPM2015 conference7, sincewe expectedthem to be familiar with process modeling, most of themcoming from academia. In total, we collected data from 47subjects.

7 See http://bpm2015.q-e.at/ for more information.

Objects We decided to select a subset of 14 models from ourdataset. The models we picked have been sampled accordingto the standard deviation on the average ranking. In particu-lar, we used the representation provided in Fig. 11:We evenlysampled our space and, for each average ranking positions,twomodels are selected: onewith low standard deviation andone with high standard deviation. By doing that, we evenlypartitioned our dataset, both with respect to the average rank-ing and the standard deviation. By selecting one process perpartition, we guarantee that each partition is actually repre-sented in our dataset.

We prepared a questionnaire for our subjects with the 14models that were selected.8 To avoid that the ordering of theprocess models causes a bias on the evaluation, we actuallyassembled two versions of the same questionnaire (labeled“A” and “B”), with the processes presented in inverted order.

Response Variables and Data Collection For each modelreported in the questionnaire, we asked participants to ratethe consistency of flow on a 7-point Likert scale ranging from1 “No consistency at all” to 7 “Complete consistency.”

Execution The actual evaluation was conducted betweenAugust 31 and September 3, 2015, during the BPM con-ference, which took place in Innsbruck. We asked all theconference participants to fill our questionnaire and return theanswers to us, without providing them any additional instruc-tion. We randomly distributed to each participant either aquestionnaire labeled “A” or “B”. These two questionnairetypes have the ordering of the process models inverted. Intotal, we collected 47 answers (25 labeled “A” and 22 “B”).

Data Analysis and Results Once we collected all our ques-tionnaires, we rescaled the values from the closed interval[1, 7] into [0, 1], just to simplify the comparison with theoutput of ourmetrics. For eachmodel, we computed the aver-age scores assigned by our participants. Then, we comparedthe average human evaluation against the automatic metricsdefined in this paper. The average human evaluation scoresof all the models as well as the values of the three metricsare reported in Table 6.

The first test we employed was the one-sample Kolmo-gorov–Smirnov, in order to verify the normality of our datadistribution which is precondition to compute the Pearsoncorrelation. The data we collected, actually, are fitting a nor-mal distribution with mean 0.541 and standard deviation0.16. The significance score observed is 0.919.

Once we verified such condition, we computed the Pear-son correlation between each metric and the average human

8 All the material used for this study is available at http://bpm.q-e.at/experiment_flow_consistency.

123

http://bpm2015.q-e.at/




0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70 80 90 100 110 120 130

Stan

dard

Dev

ia�o

n of

Ran

knin

g

Average Posi�on on Ranking

Process models Models for Human Evalua�on

Fig. 11 Average position on ranking with processes selected for the human assessment represented by large, bright-blue dots

Table 6 Descriptive variables for every model of the study. Scores forthe three metrics are reported, together with the average score and thestandard deviation of the human assessment

M-E1 M-E2 M-BP Human evaluationAverage SD

Model 1 0.73 0.85 0.68 0.43 0.25

Model 2 0.38 0.57 0.57 0.36 0.27

Model 3 0.73 0.84 0.83 0.52 0.25

Model 4 0.79 0.87 0.85 0.48 0.28

Model 5 0.37 0.59 0.78 0.39 0.26

Model 6 0.75 0.91 0.92 0.32 0.24

Model 7 0.50 0.88 0.95 0.76 0.19

Model 8 0.69 0.94 0.91 0.72 0.25

Model 9 0.55 0.64 0.70 0.50 0.30

Model 10 0.86 0.92 0.93 0.73 0.20

Model 11 0.78 0.86 0.71 0.35 0.26

Model 12 0.74 0.96 1.00 0.80 0.19

Model 13 0.63 0.81 0.81 0.55 0.29

Model 14 0.87 0.96 0.97 0.66 0.25

Table 7 Correlation of our three different approaches computed withaverage scores assigned by our subjects

M-E1 M-E2 M-BP

Pearson correlation 0.263 0.567 0.719

Significance 0.364 0.034 0.004

Bold values indicates the best scores

scores. The results are reported in Table 7. Our results sug-gest that the metric based on behavioral profiles, M-BP,best reflects human assessment, with a Pearson correlationscore of 0.719 and a significance value of 0.004. Please notethat the absolute scores of the human evaluation and M-BPare linearly shifted by a factor which, on average, is 0.29.Also metric M-E2 obtained a significant correlation withthe human judgment (Pearson correlation 0.567, significance

0.034), butwhen compared tometricM-BP, the correlation isless strong. Metric M-E1, in turn, with a Pearson correlationof 0.263 (and significance 0.364) is not correlated to humanjudgment. Figure 12 depicts the comparison of the trends ofthe scores assigned by the three approaches with the averagehuman assessments. The high correlation value of metric M-BP is reflected in the picture: Even though the two curves inFig. 12c do not overlap (i.e., the actual scores differ), theydescribe very similar behavior. Comparable effect is reportedin Fig. 12b, which represents M-E2. In Fig. 12a, instead, thecurves are touching in some points; however, the two shapesare more distinct from each other, thus indicating lower cor-relation.

From these results, we can also infer that, when evaluatingthe consistency of the flow, human perception tends to givemore importance to the position of activities than to the actualdirection of single edges. This effect entails the ability toabstract from the drawn path of edges and, instead, just focuson the actual flow of the activities.

Limitations As Table 6 reports, for the human evaluationwe had, almost for every model, quite high standard devi-ation values. This is due to the different grading habits ofpersons: Someone tends to never give the maximum score,whereas others have lower expectations and therefore assignsthe best grade more easily. Similar situation can be observedwith minimum score. Actually, this type of problems israther common when dealing with intuitive assessment: Weexpected such a phenomenon. Grouping our subjects intoclusters, generated according to the scores they share (forexample, by identifying groups of persons that almost nevergave best/worst grade), could help us in improving our evalu-ation and therefore refining our validation. Another approacharound the problem could be to “normalize” the data. How-ever, in this case, the risk of obtaining unreliable results (byapplying an artificial transformation on the input data) israther high.

123

A. Burattin et al.

Model 1Model 2

Model 3

Model 4

Model 5

Model 6

Model 7Model 8

Model 9

Model 10

Model 11

Model 12

Model 13

Model 14

M-E1 Avg. Human Evalua�on

Model 1Model 2

Model 3

Model 4

Model 5

Model 6

Model 7Model 8

Model 9

Model 10

Model 11

Model 12

Model 13

Model 14

M-E2 Avg. Human Evalua�on

Model 1Model 2

Model 3

Model 4

Model 5

Model 6

Model 7Model 8

Model 9

Model 10

Model 11

Model 12

Model 13

Model 14

M-BP Avg. Human Evalua�on

(a) (b)

(c)

Fig. 12 Comparison of human assessment and values calculated by our metrics, on all the models of our sample dataset. a M-E1 versus humanevaluation. bM-E2 versus human evaluation. c M-BP versus human evaluation

6 Related work

Related to our paper is work on the visual layout of busi-ness processes in general and work on the flow of a businessprocess model specifically.

Visual Layout of Business Process Models Research on thevisual layout of business process models has largely reliedon studies done in the field of graph drawing. A considerablebody of knowledge exists on how to automatically set the lay-out of graphical models in order to improve their readability.Studies done in the graph drawing field mainly explored thefollowing visual layout features: edge crossing, edge bends,the minimum angle between edges leaving a node, orthogo-nality, symmetry, flow direction, edge length variation, andwidth of layout [25,26]. The direct relation between thesemetrics and understandability was also investigated.

Research on aesthetics of graph layout in general [25]found that an important feature to users is minimizing linecrosses; less important are: minimizing bends, maximiz-ing symmetry; other features were not found to have a

significant effect. Research on users’ preferences of UMLlayout/appearance indicates that users rated features as fol-lows: arc crossings, orthogonality, direction of flow, arcbends, text direction, width of layout, and font type. Consid-ering processmodels, [32] explored understanding of processmodels by experts and novices in regard to the followinglayout features: line crossings, edge bends, symmetry, andvicinity of related elements. [5] investigated user preferencesof layout aesthetics for BPMN models, considering hetero-geneous user groups with the goal of designing a modelingtool for BPMN. They used line crossings, orthogonal lines,drawing area, line bends, and flow. Findings showed thatthe aforementioned layout criteria were most relevant forusers with average or greater experience and at least basiceducation in business process modeling. The layout featuresdescribed above were all identified as part of the findings inour exploratory study.

Another body of work has developed or evaluated algo-rithms that change an existing layout of a business processmodel manually or automatically, to match a desirable aes-thetical pattern for effective visual layout of a model. In

123


[9,10], both studies present algorithms which are based ona set of constraints targeting a readable layout of a processmodel (unified flow direction, minimal edge crossing, mini-mal bending points, usage of Manhattan layout). Automaticlayout ofBPMNmodels is presented in [14] and is focused onedge positioning. The study in [29] presents a comprehensiveframework which allows for a personalized process modelvisualization, meaning that the model’s visual appearancecan be tailored to the specific needs of different user groups.In addition, in the field of graph drawing, applied research hasdeveloped algorithms and related tools to automatically ormanually improve visual layout of graphs and thus improvetheir understandability. GraphEd system in [12] comparedand evaluated different algorithms of graph drawing whileconsidering the following layout criteria: edge length, edgedistribution, area, density, bends, crossings, and orthogonaledges. The work presented in [21] suggested an algorithmwhich reorders a diagram using orthogonal ordering whilepreserving the “mental map” of the diagram.

The conclusion from the reviewed works is that variousattempts to provide precise metrics of specific layout fea-tures have been made. Yet, as far as we know, all existingworks address a conveniently selected set of features, typ-ically those that are immediate to think of and possible toautomatically address. In this paper, instead, we present a setof features that are elicited based on human perception. Inaddition, we extend our own previous work [1] toward twomain directions. First, we provided three different metrics toformalize and quantify the consistency of flow feature. Thesecond important contribution consists of the evaluation ofthe newly proposed metrics. We evaluated their computa-tional requirements; we analyzed their scores over a datasetof processmodels; and finally, we compared ourmetrics withhuman perception.

Consistency of the Flow In [26], the consistency of the flowis evaluated with respect to a target direction, which canbe parameterized, considering all fragments of all edges.Specifically, the formalization computes the ratio of frag-ments pointing toward the target direction divided by thetotal number of fragments. This definition is different fromthe definitions we devised and reported in this paper in twoaspects. First, it considers only the edges’ angle. Second,in [26] authors can deal with just one direction at the time.Therefore, this approach could have problems, for exam-ple, with processes containing structures similar to the onesreported in Fig. 4, since it considers each fragment inde-pendently, or with the process in Fig. 5 which requires theassignment of two directions for each edge.

Related to our work is also the study reported in [5] whichlooks into the impact ofmodel aesthetics. The operationaliza-tion of flow remains somewhat unclear and is only informallystated as “edges are drawn such that they consider the reading

direction.” The applied notion of consistency again focuseson edges, activities as relevant features characterizing floware not considered.

Another study on the impact of the flow direction of amodel is reported in [6]. Thereby, the paper compares inan experimental setting the effect of different flow directions(either left to right, or right to left, or top to bottom, or bottomto top) on model comprehension. While the models consid-ered in this study all had a clear flow direction, focus of ourwork is a metric that is able to deal with models that onlyhave a partially consistent flow direction.

7 Conclusions and future work

The visual layout of the model, the way elements of themodel are laid out on the canvas, is an important factor forthe user’s understanding of the model. Since layout proper-ties are mostly not addressed by modeling languages, and inthe absence of enforced layout conventions, the modeler hasmuch freedom to decide on how a model will be laid out.A common set of properties, which can be used to describedifferent aspects of layout, is an essential basis needed fordeveloping an understanding, appropriate conventions, andtools that enforce them.

The contribution of this paper is twofold. First, duringa human-centered investigation, visual layout features inbusiness process models were elicited based on what usersperceive as important. Second, we concentrated on one ofthese features: the consistency of the flow. We operational-ized such a feature, which is not trivial to quantify, into threepossible metrics in a way that closely reflects human percep-tion. The outcomes of the evaluations we performed, bothautomatic and involving humans, show that all metrics canbe calculated very efficiently. Moreover, we show that twoof the proposed metrics correlate with human perception.In particular, metric M-BP achieved the highest correlationvalue. This suggests that, in this context, the position of activ-ities represents a more important features as opposed to theorientation of the edges.

Currently, all the metrics are implemented in a processmodeling tool, allowing to precisely “measure” the layoutproperties of everymodel.Yet, validation has been performedjust for the flow consistency feature. As future work, we aimto validate additional metrics to corroborate the correlationbetween the calculated metrics and the human perception ofmodels’ layout. In addition, future work may include quan-titative studies to experimentally test to what degree theselayout features indeed affect user comprehension. Once sucha link has been established, it surely would be very useful touse our metrics as an optimization criterion for the automaticdrawing of process models.

123

A. Burattin et al.

As future work, we also plan to adapt the metrics reportedin this paper to use them “on the fly,” during the processdesign phase (i.e., even before the process is completelymod-eled). This way, we could provide continuous and interactivefeedbacks to the user modeling the process.

Acknowledgements This work is partially funded by theAustrian Sci-ence Fund (FWF) projects “The Modeling Mind: Behavior Patterns inProcess Modeling” (P26609) and “ModErARe” (P26140). We are alsograteful to the anonymous reviewers who helped improving the paper.

Open Access This article is distributed under the terms of the CreativeCommons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,and reproduction in any medium, provided you give appropriate creditto the original author(s) and the source, provide a link to the CreativeCommons license, and indicate if changes were made.

References

1. Bernstein, V., Soffer, P.: Identifying and quantifying visual layoutfeatures of business process models. In: Proceedings of BPMDS2015 and EMMSAD 2015, pp. 200–213. Springer, Berlin (2015)

2. Corbin, J., Strauss, A.: Basics of Qualitative Research: Techniquesand Procedures for Developing Grounded Theory. Sage Publica-tions, Thousand Oaks (1998)

3. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introductionto Algorithms, 2nd edn. The MIT Press, Cambridge (2001)

4. Creswell, J.: Research Design: Qualitative, Quantitative, andMixed Methods Approaches. SAGE Publications, Incorporated,Thousand Oaks (2009)

5. Effinger, P., Jogsch, N., Seiz, S.: On a study of layout aestheticsfor business process models using BPMN. In: Business ProcessModeling Notation, pp. 31–45. Springer, Berlin (2010)

6. Figl, K., Strembeck, M.: Findings from an experiment on flowdirection of business process models. In: Proceedings of the 6thInternational Workshop on Enterprise Modelling and InformationSystems Architectures (EMISA), Lecture Notes in Informatics(LNI), September (2015)

7. Figl, K., Strembeck, M.: On the importance of flow direction inbusiness process models. In: 9th International Conference on Soft-ware Engineering and Applications (ICSOFTEA), pp. 132–136.Vienna, Austria (2014)

8. Gruhn, V., Laue, R.: Complexity metrics for business processmodels. In: 9th International Conference on Business InformationSystems (BIS 2006), vol. 85, pp. 1–12. Citeseer (2006)

9. Gschwind, T., Pinggera, J., Zugal, S., Reijers, H.A., Weber, B.:Edges, structures, and constraints: the layout of business pro-cess models. Technical report, Technical Report RZ 3825, IBMResearch, Zurich (2011)

10. Gschwind, T., Pinggera, J., Zugal, S., Reijers, H.A., Weber, B.: Alinear time layout algorithm for business process models. J. Vis.Lang. Comput. 25(2), 117–132 (2014)

11. Havey, M.: Essential Business Process Modeling. O’Reilly MediaInc, Sebastopol (2005)

12. Himsolt, M.: Comparing and evaluating layout algorithms withinGraphEd. J. Vis. Lang. Comput. 6(3), 255–273 (1995)

13. Kitchenham, B., Pfleeger, S., Pickard, L., Jones, P., Hoaglin, D.,El Emam, K., Rosenberg, J.: Preliminary guidelines for empiricalresearch in software engineering. IEEE Trans. Softw. Eng. 28(8),721–734 (2002)

14. Kitzmann, I., König, C., Lübke, D., Singer, L.: A simple algo-rithm for automatic layout of BPMN processes. In: Commerce andEnterprise Computing, 2009. CEC’09. IEEE Conference on, pp.391–398. IEEE (2009)

15. Krogstie, J., Sindre,G., Jørgensen,H.: Processmodels representingknowledge for action: a revised quality framework. Eur. J. Inf. Syst.15(1), 91–102 (2006)

16. La Rosa, M., ter Hofstede, A.H., Wohed, P., Reijers, H.A.,Mendling, J., van der Aalst, W.M.: Managing process model com-plexity via concrete syntax modifications. IEEE Trans. Ind. Inf.7(2), 255–265 (2011)

17. Mayer, R.E.: Multimedia Learning. Cambridge University Press,Cambridge (2009)

18. Mendling, J.: Metrics for Process Models: Empirical Foundationsof Verification, Error Prediction, and Guidelines for Correctness,vol. 6. Springer, Berlin (2008)

19. Mendling, J., Reijers, H.A., Cardoso, J.:What makes process mod-els understandable? In: Business Process Management, pp. 48–63.Springer, Berlin (2007)

20. Mendling, J., Reijers, H.A., van der Aalst, W.M.P.: Seven processmodeling guidelines (7PMG). Inf. Softw. Technol. 52(2), 127–136(2010)

21. Misue, K., Eades, P., Lai, W., Sugiyama, K.: Layout adjustmentand the mental map. J. Vis. Lang. Comput. 6(2), 183–210 (1995)

22. Moody, D.L.: The “physics” of notations: toward a scientific basisfor constructing visual notations in software engineering. IEEETrans. Softw. Eng. 35(6), 756–779 (2009)

23. Pinggera, J.: The Process of Process Modeling. Ph.D. Thesis, Uni-versity of Innsbruck (2014)

24. Pinggera, J., Zugal, S., Weber, B.: Investigating the Process ofProcess Modeling with Cheetah Experimental Platform. In: Pro-ceedings of ER-POIS ’10, pp. 13–15 (2010)

25. Purchase, H.C.: Which aesthetic has the greatest effect on humanunderstanding? In: Graph Drawing, pp. 248–261. Springer, Berlin(1997)

26. Purchase, H.C.: Metrics for graph drawing aesthetics. J. Vis. Lang.Comput. 13(5), 501–516 (2002)

27. Purchase, H.C., Allder, J.A., Carrington, D.: User preferenceof graph layout aesthetics: a UML study. In: Graph Drawing,pp. 5–18. Springer, Berlin (2001)

28. Recker, J., Rosemann, M., Indulska, M., Green, P.: Business pro-cess modeling—a comparative analysis. J. Assoc. Inf. Syst. 10(4),1 (2009)

29. Reichert, M.: Visualizing large business process models: chal-lenges, techniques, applications. In: Business ProcessManagementWorkshops, pp. 725–736. Springer, Berlin (2013)

30. Reijers, H.A., Mendling, J., et al.: A study into the factors thatinfluence the understandability of business process models. IEEETrans. Syst.Man.Cybern. PartASyst.Hum.41(3), 449–462 (2011)

31. Rinderle, S., Bobrik, R., Reichert, M., Bauer, T.: Business processvisualization—use cases, challenges, solutions. In: Manolopoulos,Y., Filipe, J., Constantopoulos, P., Cordeiro, J. (eds.) Proceedingsof the Eighth International Conference on Enterprise InformationSystems (ICEIS’06): Information System Analysis and Specifica-tion, pp. 204–211. INSTICC Press (2006)

32. Schrepfer, M., Wolf, J., Mendling, J., Reijers, H.A.: The impactof secondary notation on process model understanding. In: ThePractice of Enterprise Modeling, pp. 161–175. Springer, Berlin(2009)

33. Streit, A., Pham, B., Brown, R.: Visualization support for man-aging large business process specifications. In: Business ProcessManagement, pp. 205–219. Springer, Berlin (2005)

34. Taylor-Powell, E.,Renner,M.:AnalyzingQualitativeData.Univer-sity of Wisconsin-Extension Program Development & Evaluation,Madison (2003)

123

http://creativecommons.org/licenses/by/4.0/

http://creativecommons.org/licenses/by/4.0/


35. Vanderfeesten, I., Cardoso, J., Mendling, J., Reijers, H.A., van derAalst,WilM.P.: Quality metrics for business process models. BPMWorkflow Handb. 144, 179–190 (2007)

36. Weidlich, M.: Behavioural profiles: a Relational Approach toBehaviourConsistency. Ph.D.Thesis,HassoPlattner Institute,Uni-versity of Potsdam (2011)

37. Weidlich, M., Mendling, J., Weske, M.: Efficient consistency mea-surement based on behavioral profiles of process models. IEEETrans. Softw. Eng. 37(3), 410–429 (2011)

Andrea Burattin is postdoc-toral researcher at the Uni-versity of Innsbruck. Previ-ously, he worked as postdoctoralresearcher at the University ofPadua. In 2013 he obtained hisPh.D. degree from University ofBologna and Padua. The IEEETask Force on Process Miningawarded to his Ph.D. thesis theBest Process Mining Disserta-tionAward2012-2013.His Ph.D.thesis has then been publishedas Springer Monograph in theLNBIP series. He served as orga-

nizer of BPI workshop since 2015, and special sessions on processmining at IEEE CIDM since 2013. He is also in the program committeeof several conferences. His research interests include process miningtechniques and, in particular, online approaches over event streams.Recently, he has been working on secondary notation of business pro-cesses and on the analysis of the process of process modelling.

Vered Bernstein has a BScin software engineering fromChamplain College in Burling-ton Vermont, and is currentlya MSc student in informationsystems at the University ofHaifa. Her MSc thesis researchis focused on the quantificationof business process model visuallayout.

Manuel Neurauter is a Ph.D.candidate at the University ofInnsbruck (Austria). Manuel isa member of Quality Engineer-ing (QE) Research Group andmember of the Research ClusteronBusiness Processes andWork-flows at QE. Manuel received hisM.Sc. degree from the Depart-ment of Psychology, Universityof Innsbruck in 2013. His mainresearch interest is the influenceof human cognition and cogni-tive load on the design processof process models. His research

interests further include the influence of cognitive abilities in generaland particularlyworkingmemory functions on business process quality.Manuel has published five papers in international journals, conferences,and workshops.

Pnina Soffer is an AssociateProfessor, the head of the Infor-mation Systems Department atthe University of Haifa. Shereceived her BSc (1991) andMSc (1993) in Industrial Engi-neering, Ph.D. in InformationSystems Engineering from theTechnion—Israel Institute ofTechnology (2002). Her researchdeals with business processmod-eling and management, require-ments engineering and concep-tual modeling, addressing issuessuch as goal orientation, flexibil-

ity, human aspects and process mining. Pnina has published over 120papers in journals and conference proceedings. Among others, herresearch has appeared in journals such as Journal of the AIS, EuropeanJ of IS, Decision Support Systems, and Information Systems. She hasserved as a guest editor of a number of journal special issues and in sev-eral journal editorial boards. Pnina has served in program committeesand held organizational positions in numerous conferences, includingCAiSE, BPM, ER, and ICIS. She is a member of the CAiSE steeringcommittee, an organizer of the BPMDS working conference, and sheserved as a program chair in BPM and in CAiSE.

123

A. Burattin et al.

Barbara Weber is Full Pro-fessor and Head of the Soft-ware Engineering Section at theTechnical University of Den-mark, Denmark. Moreover, sheholds an Associate Professorposition at theUniversity of Inns-bruck (Austria). Barbara holdsa Habilitation degree in Com-puter Science and Ph.D. fromthe University of Innsbruck. Bar-bara’s research interests includeprocess model understandabil-ity, process of process modeling,process flexibility, and user sup-

port in flexible process-aware systems as well as neuro-adaptiveinformation systems. Barbara has published more than 130 refereedpapers, for example, in Nature Scientific Reports, Data and Knowl-edge Engineering and Software and System Modeling, Computers inIndustry, Science of Computer Programming, Enterprise InformationSystems and Information and Software Technology and co-author ofthe book “Enabling Flexibility in Process-aware Information Systems:Challenges, Methods, Technologies” by Springer.

123

detection and quantification of flow ... - andrea.burattin.net · b andrea burattin...

Documents