from informal process diagrams to formal process models

From Informal Process Diagrams To Formal Process Models

Debdoot Mukherjee‡, Pankaj Dhoolia‡, Saurabh Sinha‡

Aubrey J Rembert†, Mangala Gowri Nanda‡

IBM Research - India, New Delhi, India‡

IBM TJ Watson Research Center, New York, USA†

We build too many walls and not enough bridges- Sir Issac Newton

Free form diagramming tools (e.g., Visio, Powerpoint) are preferred in creation for initial process models Ease of use, Intuitiveness Ubiquity Doesn’t hinder your creativity

Process modeling software (e.g., WBM, ARIS) create models with formal underpinnings Allow formal analysis, model checking Process Reuse Process Improvement Traceability with realized executable process

Sound, automatic approach to convert process diagrams to formal process models is essential A bridge between the worlds of diagramming and formal modeling

September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

Outline

Challenges Ambiguities in diagrams Limitation of existing capabilities

Approach Structure Inference Semantic Interpretation

Empirical Study

Related Work & Future directionsSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

Challenges in Diagram Interpretation

Human can interpret different visual cues in drawings to correctly resolve the structure and semantics of the models, but machines cannot do the same!


Structural Ambiguities

Dangling Connectors : Connectors not glued to shapes at their endpoints

Missing Edge

Missing Edge

Missing Edge

Structural Ambiguities

Unlinked Labels: Text annotations not explicitly part of any shape for node/edge

Semantic Ambiguities

Over-specification: Same semantic conveyed in multiple shapes

Under-specification: Same shape conveys multiple semantics


Limitation of Existing Tools Popular BPM tools such as Websphere Business

Modeler, ARIS, Lombardi, Telelogic System Architect, have Visio import capabilities

Create imprecise flow structure when faced with structural ambiguities

Employ a simple mapping (fixed or pluggable) from a set of diagram shapes to a target set of process semantics to interpret semantics Such an approach cannot deal with under-specification Building an exhaustive mapping is painful in presence of

over-specificationSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

Approach

Flow Graph

Process Model


Shapes &

Attributes

Process Diagra

m

Approach

Shapes &

Attributes

Flow Graph


Process Diagra

m

Process Model

Edge Inference


A B

C D

A B

C D

Edge Inference


A B

C D

C1

C4

C2C3

C8C5C6

C7

Uses notion of connection points created at node – line and line – line intersections

Assign direction to connection points

Starting at connection points attached to nodes, propagate their directions along paths in which the directions are consistent and identifies the reached nodes

Create edges if connection point at reached node has a different direction

SRC

SRC

TGT

TGT

TGT UNK

NEU

NEU

SRC SRC SRC

SRC

SRC

Approach

Shapes &

Attributes

Process Diagra

m

Flow Graph

Process Model


Semantic Interpretation

Train a classifier to mimic human reasoning to decide process semantics

Features used for classification: Relational: Indegree, Outdegree, Count of

nodes contained within Geometric: Shape name, Count of

horizontal, vertical, diagonal lines Textual: Count of cue words for every

target entitySeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

Supervised Method


{Nodes, Edges}

Annotated by Features

Structure Inference

Flow Diagram

s

{Nodes, Edges}Annotated by

Features + Process Semantic

An expert labels all nodes & edges in the input set of diagrams by their semantics

Classifier establishes correspondence between the features and labels for process semantics

Classifier

Unsupervised Method

Structure Inference

Flow Diagram

s

{Nodes, Edges}Annotated by

Features + Process Semantic

An expert looks at exemplars from each cluster to label process semantic of the cluster

Classifier

{Nodes, Edges}

Annotated by Features

Clusterer

Cluster A

Cluster B

Clusters have

common semanti

cs

Cluster A = Semantic X

Cluster B = Semantic Y


Empirical Study

Data Set: 185 Visio process diagrams created in real business-transformation projects

Objective: Compare accuracy of our tool iDISCOVER and a popular modeling tool (called PMT for proprietary reasons)

Method: Compare tool outputs with models created manually by human experts to measure precision & recall

Precision = |Actual ∩ Retrieved| , Recall = |Actual ∩ Retrieved|

|Retrieved| |Actual|September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

Evaluation - Structure Inference

Element iDISCOVER PMT

Precision Recall Precision Recall

Node 96.93 95.91 70.44 86.29

Edge 93.26 90.86 63.43 59.87


Ambiguity Instances per File % Files with ambiguityHigh Average

Dangling Connector

47 (100%) 3 (14%) 56%

Unlinked Labels 46 (39%) 2 (3.7%) 38%

Count of dangling connectors has a greater correlation with the edge recall of PMT(ρ = −0.48) than with the edge recall of iDISCOVER (ρ = −0.08).

Evaluation – Semantic Interpretation


• Our Precision (Overall Δ ≈30%) and Recall (Overall Δ ≈20%) for all process semantic classes are greater than that of PMT. •Unsupervised is almost as good as supervised

•Accuracy is low only for scarce entities like Intermediate Events and Data Objects (together less than 3% of the data set)•Better results possible with a more equitable distribution of entitiesSize of the training data need not be huge. Classification could

work almostas well with only a third of the dataset size

Related Work

Large body of work in the area of understanding line drawings and hand sketches (e.g., Futrelle, Gross, Barbu) Focus on identifying shape geometry Semantic interpretation follows directly

from a fixed mapping between shape geometry and target semantics

Visual Language theory prescribes geometry detection with grammar rules.


Future Work

More efficient modeling of textual cues Text is the only reliable feature in highly ambiguous

scenarios

Tracking spatial patterns of shapes and labels that emerge due to local styles

Identification of higher-level relations (block structures) between model entities (e.g., sub-process, loop, and fork-merge)

Extend to other diagram typesSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

Conclusion

Informal process diagrams contain structural and semantic ambiguities – need to be dealt with in order to discover precise formal models

Existing capabilities are limited because: Do not resolve structural ambiguities Interpreting semantic based on shape name does not suffice

Standard pattern-classification techniques can be successfully employed in interpreting process semantics if the feature space is carefully modeled to mimic human reasoning Unsupervised clustering can almost match supervised

techniques in performanceSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA

from informal process diagrams to formal process models

Technology

business process management

process diagrams

process semantics features

formal process models

international conference

diagram attributes

set of diagram shapes

websphere business modeler