iniset@caise 2011
DESCRIPTION
Slides of my presentation at INISET workshop at CAiSE conference, 21 June 2011, London, UKTRANSCRIPT
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 June, 2011
FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
Integrating Computer Log Files for Process Mining
A Genetic Algorithm Inspired Technique
Jan [email protected]://processmining.ugent.beGhent University, Belgium
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 June, 2011
FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
1. Process Mining
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 20113 / 24
A plane crashed... What happened?
Analyse the ‘black box’
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 20114 / 24
A process failed... What happened?
Analyse the ‘black box’: look for historical data
Process Mining:
Reconstruct and analyse processes
From historical process data
• Log files
• Audit trails
• Database history fields/tables
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 20115 / 24
Process Mining
Processes are supported by IT systems
IT systems record actual process data
Process data can be used to automatically
Discover process model
Check conformance with existing process info
Extend existing process model
Attention
Only As-Is
Only (correctly) recorded information
Process Mining
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 20116 / 24
Preparation
Collect data: find traces
Merge data: from different sources
Structure data: group per instance
Convert data: to tool specific format
Process mining
Make decisions, take action
Process Mining steps
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 20117 / 24
Process Mining steps
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 June, 2011
FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
2. Merging log files
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 20119 / 24
Example
Product ordering: registered events:
Sales order: document creation (administration)
Delivery: truck load confirmation (warehouse)
Invoice: document creation (administration)
Logging
from administration software
from warehouse software
How to merge both log files?
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201110 / 24
Example 1
Administration Warehouse
Merge based on matching trace identifiers
SO > Inv
SO > Inv
SO > Inv
SO1
SO2
SO3
Deliver
Deliver
Deliver
SO1
SO2
SO3
SO > Deliver > Inv
SO > Deliver > Inv
SO > Deliver > Inv
SO1
SO2
SO3
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201111 / 24
Example 2
Administration Warehouse
Merge based on matching attribute values
SO > Inv
SO > Inv
SO > Inv
SO1
SO2
SO3
DeliverDel1
Del2
Del3
SO > Deliver > Inv
SO > Deliver > Inv
SO > Deliver > Inv
SO1
SO2
SO3
(SO1)
Deliver (SO2)
Deliver (SO3)
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201112 / 24
t1<t2<t3
<<t4<t5<t6
<<t7<t8<t9
Example 3
Administration Warehouse
Merge based on time information
SO > Inv
SO > Inv
SO > Inv
SO1
SO2
SO3
DeliverArr1
Arr2
Arr3
SO > Deliver > Inv
SO > Deliver > Inv
SO > Deliver > Inv
SO1
SO2
SO3
Deliver
Deliver
t1 t3
t4t6
t7 t9
t2
t5
t8
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201113 / 24
Merging computer log files
Merge based on
Example 1: matching trace identifiers indicator 1
Example 2: matching attribute values indicator 2
Example 3: time information indicator 3
General solution algorithm combining different indicators
Genetic algorithm indicators build up fitness function
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 June, 2011
FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
3. Genetic algorithm
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201115 / 24
Genetic algorithm
1st generation 2nd generation 3th generation
cross-over
mutation
survival of the fittest
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201116 / 24
Genetic algorithm
1st generation 2nd generation 3th generation
mutation
cross-over
survival of the fittest
14
27
6
18
29
5
18
28
32
Fitness function score
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201117 / 24
Genetic algorithm inspired technique
Find links between traces of both log files and merge them chronologically in new log file
Steps
Make initial solution (best individual links)
Make pseudo-random changes (try to improve score for one specific factor)
Evaluate (keep original or changed solution)
Stop condition (fixed amount of steps)
Only one solution, no cross-over
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 June, 2011
FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
4. Experiment results
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201119 / 24
Experiment: proof of concept
Simulated data
Given model
Generate
• random set of logs
• single log (=solution)
Use merge algorithm to merge set of logs
Check resulting log with solution log
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201120 / 24
Experiment: proof of concept
Advantages of using simulated data
Solution is known
Controllable parameters (e.g. noise, overlap, matching id)
Disadvantages of using simulated data
Limited internal validity (are results realistic?)
No external validity (results not generalisable)
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 / 24
Experiment results
Incorrect links related to total links identified
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201121 June, 2011
FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
5. Discussion
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201123 / 24
Future work
Optimise genetic algorithm
Less incorrect links
Faster implementation (AIS algorithm)
Fitness function factors
Validation with real test cases
Ghent University DPO (Human Resources)
Century21 (Real Estate) & FlexPack (Packaging)
BNP Paribas Fortis (Finance)
...
Faculty of Economics and Business Administration Department of Management Information and Operations Management
Jan Claes for INISET@CAiSE 201124 / 24
Contact information
http://processmining.ugent.beTwitter: @janclaesbelgium