fast business process similarity search with feature- based estimation zhiqiang yan*, remco dijkman,...

Fast Business Process Similarity Search with Feature- based Estimation

Zhiqiang Yan*, Remco Dijkman, Paul Grefen

Contents

• Business Process Similarity Search• Process Graph Similarity Estimation• Feature Matching and Process Graph Similarity• Evaluation• Conclusion

2

Business Process Similarity Search

Given a process model repository and a query process, it returns all the similar processes in the repository with respect to the query process.

3

Business Process Similarity Search

4

Similar to

not Similar to

• Dijkman et al. (BPM09) present algorithms that can rank all the business process models in a repository basing on their similarities to a given query process model.

• However, compare the query model with all the models in the repository.

State of the art

5

Contents


6

• Model sets: relevant, potentially relevant, irrelevant.• Only rank models in the “potentially relevant” set with

algotithms, e.g., BPM09.

Process Graph Similarity Estimation

7

Relevant Potentially relevant

Irrelevant

Rank

Contents


8

Features

Features are:• small fragments• characteristic for a model

This makes them suitable for quick rough measurements

Features are:• labels• structures (start, stop, split, join and regular (sequence))

• role of a node

• combination of nodes

9

Features

• Node feature• Label: { Buy Goods, Receive Goods, Verify

Invoice}

• role: {(start,split),(regular),(join,stop)}

• Seq (2) feature : {(Buy Goods, Verify Invoice), (Buy Goods, Receive Goods), (Receive Goods, Verify Invoice)}

• Split(3) feature : {(Buy Goods, (Verify Invoice, Receive Goods))}

• Merge(3) feature : {(Buy Goods, Receive Goods), Verify Invoice)}

10

Number of nodes

Label Feature Similarity

• Label feature

• lSim (l1,l2) = 1.0 - 7/13 = 0.46

• lSim (l1,l2) >= lcutoff ----- Similar

11

Ed(l1,l2)

Max(|l1|,|l2|)

String Edit distance between label1 and label2

Max length of label1 and label2

lSim (l1,l2) = 1.0 -

Role Feature Similarity

• Role feature

where croles = role(n)∩role(m)

• Similarity of input role:1-0/(1+1)=1 Similarity of output role: 1-2/2=0

• rSim (n1,n2)=(1+0)/2=0.5

12

rSim (n1,n2) =

1 if start ∈ croles ∧ stop ∈ croles

avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|),1)if start ∈ croles ∧ stop ∈ croles \

avg(1,1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \

avg(1-abs(|*n1|-|*n2|)/(|*n1|+|*n2|),1-abs(|n1*|-|n2*|)/(|n1*|+|n2*|)) if start ∈ croles ∧ stop ∈ croles \ \

Discriminative Role Feature

• Discriminative Role feature• |{n|n∈N, r∈R(n)}|/|N|<=dcutoff -> discriminative(r)

• Discriminative power

13

disc(n1,n2)=1 if any r∈ role(n)∩ r∈ role(n) : discriminative(r)

0 otherwise

Feature Similarity

• Role feature

• rSim (n1,n2) *disc(n1,n2)>= rcutoff ----- Similar

14

Feature Matching

• Node feature matching rules: • lSim (l1,l2) >= lcutoffh ----- matched

• lSim (l1,l2) >= lcutoffm and rSim (n1,n2) *disc(n1,n2)>= rcutoff ----- matched

• Sequence, split and join feature matching rules : • base on node feature matching

15

Feature-based Process Graph Similarity and Pre-Selection

16

m1+m2

n1+n2

Number of features are matched

Number of features in g1 and g2 GSim(g1,g2) =

GSim = ratior GSim = ratiop

Relevant Potentially relevant

Irrelevant

Contents


17

Quality Evaluation

• 100 'document' processes• 10 'search' processes• 'documents' relevant to each 'search' determined by human

judgement• retrieve ‘documents’ basing on features• comparison between automatically retrieved results and human

judgement • compute precision(R)

18

Quality Evaluation

19

Feature(n) Relevant Potential Irrelevant Precision(R)

Dijkman09 0 100 0 0.84

Node(1) 5.5 10.9 83.6 0.84

Seq(2) 8.1 8 83.9 0.83

Seq(3) 7.8 10.1 82.1 0.83

Split(3) 7.8 10.1 82.1 0.83

Split(4) 7.8 10.1 82.1 0.83

Merge(3) 7.8 10.1 82.1 0.83

Merge(4) 7.8 10.1 82.1 0.83

Time Evaluation

• 604 'document' processes• 10 'search' processes• compute time consuming

20

Time Evaluation

21

Feature(n) Relevant Potential Irrelevant Time

Dijkman09 0 604 0 0.6s

Node(1) 7 73 524 0.17s

Seq(2) 13.7 44.9 554.4 0.11s

Seq(3) 9.5 73.2 521.3 0.21s

Split(3) 9.5 73.2 521.3 0.21s

Split(4) 9.5 73.2 521.3 0.21s

Merge(3) 9.5 73.2 521.3 0.21s

Merge(4) 9.5 73.2 521.3 0.21s

Conclusions

• 7 types of features to pre-select processes• Node and Path(2) features works well• Larger features do not help• Search time is reduced • Precision(R) is stable

22

fast business process similarity search with feature- based estimation zhiqiang yan*, remco dijkman,...

Documents