fast business process similarity search with feature- based estimation zhiqiang yan*, remco dijkman,...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Fast Business Process Similarity Search with Feature- based Estimation
Zhiqiang Yan*, Remco Dijkman, Paul Grefen
Contents
• Business Process Similarity Search• Process Graph Similarity Estimation• Feature Matching and Process Graph Similarity• Evaluation• Conclusion
2
Business Process Similarity Search
Given a process model repository and a query process, it returns all the similar processes in the repository with respect to the query process.
3
• Dijkman et al. (BPM09) present algorithms that can rank all the business process models in a repository basing on their similarities to a given query process model.
• However, compare the query model with all the models in the repository.
State of the art
5
Contents
• Business Process Similarity Search• Process Graph Similarity Estimation• Feature Matching and Process Graph Similarity• Evaluation• Conclusion
6
• Model sets: relevant, potentially relevant, irrelevant.• Only rank models in the “potentially relevant” set with
algotithms, e.g., BPM09.
Process Graph Similarity Estimation
7
Relevant Potentially relevant
Irrelevant
Rank
Contents
• Business Process Similarity Search• Process Graph Similarity Estimation• Feature Matching and Process Graph Similarity• Evaluation• Conclusion
8
Features
Features are:• small fragments• characteristic for a model
This makes them suitable for quick rough measurements
Features are:• labels• structures (start, stop, split, join and regular (sequence))
• role of a node
• combination of nodes
9
Features
• Node feature• Label: { Buy Goods, Receive Goods, Verify
Invoice}
• role: {(start,split),(regular),(join,stop)}
• Seq (2) feature : {(Buy Goods, Verify Invoice), (Buy Goods, Receive Goods), (Receive Goods, Verify Invoice)}
• Split(3) feature : {(Buy Goods, (Verify Invoice, Receive Goods))}
• Merge(3) feature : {(Buy Goods, Receive Goods), Verify Invoice)}
10
Number of nodes
Label Feature Similarity
• Label feature
• lSim (l1,l2) = 1.0 - 7/13 = 0.46
• lSim (l1,l2) >= lcutoff ----- Similar
11
Ed(l1,l2)
Max(|l1|,|l2|)
String Edit distance between label1 and label2
Max length of label1 and label2
lSim (l1,l2) = 1.0 -
Role Feature Similarity
• Role feature
where croles = role(n)∩role(m)
• Similarity of input role:1-0/(1+1)=1 Similarity of output role: 1-2/2=0
• rSim (n1,n2)=(1+0)/2=0.5
12
rSim (n1,n2) =
1 if start ∈ croles ∧ stop ∈ croles
avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|),1)if start ∈ croles ∧ stop ∈ croles \
avg(1,1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \
avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|),1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \ \
Discriminative Role Feature
• Discriminative Role feature• |{n|n∈N, r∈R(n)}|/|N|<=dcutoff -> discriminative(r)
• Discriminative power
13
disc(n1,n2)=1 if any r∈ role(n)∩ r∈ role(n) : discriminative(r)
0 otherwise
Feature Matching
• Node feature matching rules: • lSim (l1,l2) >= lcutoffh ----- matched
• lSim (l1,l2) >= lcutoffm and rSim (n1,n2) *disc(n1,n2)>= rcutoff ----- matched
• Sequence, split and join feature matching rules : • base on node feature matching
15
Feature-based Process Graph Similarity and Pre-Selection
16
m1+m2
n1+n2
Number of features are matched
Number of features in g1 and g2 GSim(g1,g2) =
GSim = ratior GSim = ratiop
Relevant Potentially relevant
Irrelevant
Contents
• Business Process Similarity Search• Process Graph Similarity Estimation• Feature Matching and Process Graph Similarity• Evaluation• Conclusion
17
Quality Evaluation
• 100 'document' processes• 10 'search' processes• 'documents' relevant to each 'search' determined by human
judgement• retrieve ‘documents’ basing on features• comparison between automatically retrieved results and human
judgement • compute precision(R)
18
Quality Evaluation
19
Feature(n) Relevant Potential Irrelevant Precision(R)
Dijkman09 0 100 0 0.84
Node(1) 5.5 10.9 83.6 0.84
Seq(2) 8.1 8 83.9 0.83
Seq(3) 7.8 10.1 82.1 0.83
Split(3) 7.8 10.1 82.1 0.83
Split(4) 7.8 10.1 82.1 0.83
Merge(3) 7.8 10.1 82.1 0.83
Merge(4) 7.8 10.1 82.1 0.83
Time Evaluation
21
Feature(n) Relevant Potential Irrelevant Time
Dijkman09 0 604 0 0.6s
Node(1) 7 73 524 0.17s
Seq(2) 13.7 44.9 554.4 0.11s
Seq(3) 9.5 73.2 521.3 0.21s
Split(3) 9.5 73.2 521.3 0.21s
Split(4) 9.5 73.2 521.3 0.21s
Merge(3) 9.5 73.2 521.3 0.21s
Merge(4) 9.5 73.2 521.3 0.21s
Conclusions
• 7 types of features to pre-select processes• Node and Path(2) features works well• Larger features do not help• Search time is reduced • Precision(R) is stable
22