discovering interacting artifacts from erp systems ...ceur-ws.org/vol-1701/paper2.pdfjan mendling...
TRANSCRIPT
Jan Mendling and Stefanie Rinderle-Ma, eds.: Proceedings of EMISA 2016,Gesellschaft fur Informatik, Bonn 2016
Discovering Interacting Artifacts from ERP Systems(Extended Abstract)3
D. Fahland1, X. Lu1, Marijn Nagelkerke2, Dennis van de Wiel2
Abstract: Enterprise Resource Planning (ERP) systems are widely used to manage business doc-uments along a business processes and allow very detailed recording of event data of past processexecutions and involved documents. This recorded event data is the basis for auditing and detectingunusual flows. Process mining techniques can analyze event data of processes stored in linear eventlogs to discover a process model that reveals unusual executions. Existing techniques assume a linearevent log that use a single case identifier to which all behavior can be related. However, in ERP sys-tems processes such as Order to Cash operate on multiple interrelated business objects, each havingtheir own case identifier, their own behavior, and interact with each other. Forcing these into a singlecase creates ambiguous dependencies caused by data convergence and divergence which obscuresunusual flows in the resulting process model. We present a new semi-automatic, end-to-end approachfor analyzing event data in a plain database of an ERP system for unusual executions. We identify anartifact-centric process model describing the business objects, their life-cycles, and how the variousobjects interact along their life-cycles. The technique was validated in two case studies and reliablyrevealed unusual flows later confirmed by domain experts. The work summarized in this extendedabstract has been published in [Lu15].
Keywords: Process Mining, ERP-System, Artifact-Centric Model, Object Life-Cycle, InteractionDiscovery
1 Introduction and Problem Description
Information systems (IS) not only store and process data in an organization but also recordevent data about how and when information changed. This “historical event data” can beused to analyze, for instance, whether information processing in the past conformed to theprescribed processes or to compliance requirements. Process mining [Aa11] offers auto-mated techniques for this task. In particular exploring visual models discovered from eventdata allows to identify unusual flows and their circumstances; based on which concretemeasures for process improvement can be devised [Ec15]. Prerequisite to this analysis isa process event log that holds events about information changes with the assumption thateach event belong to one specific execution of a specific process.
In general, information access is not tied to a particular process execution; rather the sameinformation can be accessed and changed from various processes and applications. For
1 [email protected], Eindhoven University of Technology, The Netherlands2 KPMG IT Advisory N.V., Eindhoven, The Netherlands3 This article summarizes problem, approach, and selected findings of a study published as Xixi Lu, Marijn
Nagelkerke, Dennis van de Wiel, and Dirk Fahland. Discovering Interacting Artifacts from ERP Systems. Ser-vices Computing, IEEE Transactions on, 8(6), 2015 doi:10.1109/TSC.2015.2474358 [Lu15].
1
Documents Changes
Change id Date changed Reference id Table name Change type Old Value New Value
1 17-5-2020 S1 SD Price updated 100 80
2 19-5-2020 S1 SD Delivery block released X -
3 19-5-2020 S1 SD Billing block released X -
4 10-6-2020 B1 BD Invoice date updated 20-6-2020 21-6-2020
Billing documents (BD)
BD id Date created Document type Clearing date
B1 20-5-2020 Invoice 31-5-2020
B2 24-5-2020 Invoice 5-6-2020
Delivery documents (DD)
DD id Date created Reference SD id Reference BD Document type Picking date
D1 18-5-2020 S1 B1 Delivery 31-5-2020
D2 22-5-2020 S1 B2 Delivery 5-6-2020
D3 25-5-2020 S2 B2 Delivery 5-6-2020
D4 12-6-2020 S3 null Return Delivery NULL
Sales documents (SD)
SD id Date created Reference id Document type Value Last change
S1 16-5-2020 null Sales Order 100 10-6-2020
S2 17-5-2020 null Sales Order 200 31-5-2020
S3 10-6-2020 S1 Return Order 10 NULL
F1
F2
F4 F3
Parent
table
Child
table
Fig. 1: The tables of the simplified OTC example
(Sales Order)
(Delivery)
(Invoice)
(Delivery)
(Invoice)
(Return Order)
(Return Delivery)
(Sales Order) (Invoice) (Delivery)
Divergence
Convergence
7 events “Created” related to S1
3 events “Created” related to S2
(a)
Legend:
Return delivery
Invoice
Sales order
created2
Delivery
created3
created2
Return order
created1
created1
2
1
2
1
1
artifact
Event type
Causal relationor interaction
Deviating interaction
Sales orderSales order created2
Deliverycreated
3
Invoicecreated
3
Return order
created1
Return deliverycreated
1
1
1
2
2
1 1
(b)
(c)
Fig. 2: Creation of documents of Fig. 1 along time (a), a classic process model based on a single caseidentifier (b), and an artifact-centric model based on 5 case identifiers (c).
instance, in Enterprise Resource Planning (ERP) systems (such as SAP and Oracle En-terprise), information is stored in business objects (or documents) which are linked viaone-to-many and many-to-many relations, typically in the relational database. The objectsthemselves are encapsulated in services [AMZ00] which are invoked by high-level end-to-end business processes; each invocation is called a transactions which is logged in thedata object itself.
Fig. 1 shows a simplified example of the transactional data of an Order to Cash (OTC)process supported by SAP systems; Fig. 2(a) visualizes the events of Fig. 1 that are relatedto document creation. There are two sales orders S1 and S2; creation of S1 is followedby creation of a delivery document D1, an invoice B1, another delivery document D2, andanother invoice B2 which also contains billing information about S2. Creation of S2 isalso followed by creation of another delivery document D3. Further, there is a return orderS3 related to S1 with its own return delivery document D4. The many-to-many relationsbetween documents surface in the transactional data of Fig. 1: a sales document can berelated to multiple billing documents (S1 is related to B1 and B2) and a billing documentcan be related to multiple sales document (B2 is related to S1 and S2). This behavioralready contains an unusual flow: delivery documents were created twice before the billingdocument (main flow), but once the order was reversed (B2 before D3).
When applying classical process mining techniques, one first has to extract an event logbased on a single case identifier to which all event data can be related. Choosing SD idin Fig. 1 leads to the two sequences of events shown in Fig. 2(a). Process discovery on
2
this log yields the model of Fig. 2(b) which is wrong: two invoices are created beforetheir deliveries instead of one, and three invoices are created instead of two (known asdivergence and convergence, respectively) [Pi11].
2 Approach: Discovering Artifact-Centric Models
We propose to approach the problem under the “conceptual lens” of artifact-centric mod-els [CH09]. An artifact is a data object over an information model; each artifact instanceexposes services that allow changing its informational contents; a life-cycle model gov-erns when which service of the artifact can be invoked; the invocation of a service in oneartifact may trigger the invocation of another service in another artifact. Information mod-els of different artifacts can be in one-to-many and many-to-many relations allowing todescribe behavior over complex data in terms of multiple objects interacting via serviceinvocations. Under this lens, each document of an ERP system can be seen as an arti-fact; a transaction on a document is a service call on the artifact; behavioral dependenciesbetween transactions of documents can be seen as life-cycle behavior and dependenciesof service calls. Describing the transactional data of Fig. 1 with artifact-centric conceptsyields the model of Fig. 2(c); it visualizes the order in which objects are created and alsohighlights the unusual flow of invoice B2 being created before delivery D2.
Case A1Case A2
Case An
Case B1Case B2
Case Bm
Case C1Case C2
Case Ck
Data Source
Database Schema
Artifact Type
Event Log
1.2 Extract
1.1 Discover
2.1 Discover
Type-Level Interaction
2.2 Add casereferences
Life-Cycle Models...
2.3 Discover
+ Activity-Level Interactions
1.3 Discover
Fig. 3: An overview on our approach.
The problem of discovering an artifact-centricprocess model from relational ERP data de-composes into two sub-problems. (1) Given arelational data source, identify a set of artifacts,extract for each artifact an event log, and dis-cover a model of its life-cycle. (2) Given a setof artifacts and their data source, identify inter-actions between the artifacts, between their in-stances, between their event types and betweentheir events. Figure 3 shows the overview ofour approach. (1.1) We use the data schema ofthe data source to discover artifact types whichdetail all timestamped columns related to a par-ticular business object. (1.2) For each artifactwe then extract a classical event log [Aa11],each case describes all events related to oneinstance of the artifact. (1.3) Existing processdiscovery algorithms allow discovering a life-cycle model of the artifact. In parallel, (2.1)we discover interactions between artifacts fromforeign key relations in the data source; (2.2)during log extraction, each case of an artifact isannotated with references to cases of other artifacts this case interacts with. (2.3) The casereferences are refined into interactions between activities of different artifact life-cycles.
3
3 Results
We implemented our approach based on [NvDF12] and conducted two case studies. Byseparating data into artifacts along one-to-many relations, we eliminated divergence andconvergence, the interaction flows discovered from one-to-many relations were meaning-ful to business users, and unusual flows were detected.
8
1
1472
1
6
341
346
53
59
142
18
17
90
108
138
1
141
13
3
257
22
65
47
2
278
1360
1
2439
23
907
1
46
620
597
1
1
212
1
1
1
2
22
10
12
6
1
2
1
15
107
78
26
31
7
12
371
316
2
11
49
9
15
22 1
89
1
3
18
1298
854
354
42
5 9
1
14
15
1140
646
39
46
55
641
14
304
271
568
734
625
4
8
49
7
5
6
Created
2581
Delivery H_Created
5118
Payment05or15_Payment Received
822
PostInAR_PostedInAR
3479
Invoice H_Created
2629
InvoiceCancellation H_Created
40
Contract H_Created
741
ProFormaInvoice H_Created
730
ReturnDelivery H_Created
1
CreditMemo H_Created
34
ReturnOrder H_Created
1
CreditMemoRequest H_Created
33
DebitMemoRequest H_Created
2
DebitMemo H_Created
14
Fig. 4: SAP OTC process, classical process modelobtained from single event log (top) and artifact-centric model highlighting outliers (bottom)
Fig. 4 shows models obtained from 2months data of an SAP Order-to-Cash pro-cess (11 document header tables, 134,826records of 5-49 attributes); the model at thetop was obtained with a classical approach(only 29 of 77 edges are correct); themodel at the bottom was obtained usingour approach. In both case studies the dis-covered process models were assessed asaccurate graphical representations of thesource data by domain experts; all edgesincluding outlier edges were assessed ascorrect and traced back to the source datatogether with domain experts. These in-sights could be obtained exploratively andmuch faster than with existing best practices.
References[Aa11] Aalst, W.M.P. van der: Process Mining: Discovery, Conformance and Enhancement of
Business Processes. Springer, 2011.
[AMZ00] Al-Mashari, Majed; Zairi, Mohamed: Supply-chain re-engineering using enterprise re-source planning (ERP) systems: an analysis of a SAP R/3 implementation case. IJPDLM,30(3/4):296–313, 2000.
[CH09] Cohn, D.; Hull, R.: Business artifacts: A data-centric approach to modeling businessoperations and processes. Bulletin of the IEEE Computer Society TCDE, 32(3):3–9,2009.
[Ec15] van Eck, Maikel L.; Lu, Xixi; Leemans, Sander J. J.; van der Aalst, Wil M. P.: PM ˆ2: A Process Mining Project Methodology. In: CAiSE 2015. volume 9097 of LNCS.Springer, pp. 297–313, 2015.
[Lu15] Lu, Xixi; Nagelkerke, Marijn; van de Wiel, Dennis; Fahland, Dirk: Discovering Inter-acting Artifacts from ERP Systems. IEEE Trans. Services Computing, 8(6):861–873,2015.
[NvDF12] Nooijen, Erik H. J.; van Dongen, Boudewijn F.; Fahland, Dirk: Automatic Discoveryof Data-Centric and Artifact-Centric Processes. In: DAB’12. volume 132 of LNBIP.Springer, pp. 316–327, 2012.
[Pi11] Piessens, D.A.M.: Event Log Extraction from SAP ECC 6.0. Master’s thesis, EindhovenUniversity of Technology, 2011.
4