from informal process diagrams to formal process models
DESCRIPTION
Process modeling is an important activity in business transformation projects. Free-form diagramming tools, such as PowerPoint and Visio, are the preferred tools for creating process models. However, the designs created using such tools are informal sketches, which are not amenable to automated analysis. Formal models, although desirable, are rarely created (during early design) because of the usability problems associated with formal-modeling tools. In this paper, we present an approach for automatically inferring formal process models from informal business process diagrams, so that the strengths of both types of tools can be leveraged. We discuss different sources of structural and semantic ambiguities, commonly present in informal diagrams, which pose challenges for automated inference. Our approach consists of two phases. First, it performs structural inference to identify the set of nodes and edges that constitute a process model. Then, it performs semantic interpretation, using a classifier that mimics human reasoning to associate modeling semantics with the nodes and edges. We discuss both supervised and unsupervised techniques for training such a classifier. Finally, we report results of empirical studies, conducted using flow diagrams from real projects, which illustrate the effectiveness of our approach.TRANSCRIPT
From Informal Process Diagrams To Formal Process Models
Debdoot Mukherjee‡, Pankaj Dhoolia‡, Saurabh Sinha‡
Aubrey J Rembert†, Mangala Gowri Nanda‡
IBM Research - India, New Delhi, India‡
IBM TJ Watson Research Center, New York, USA†
We build too many walls and not enough bridges- Sir Issac Newton
Free form diagramming tools (e.g., Visio, Powerpoint) are preferred in creation for initial process models Ease of use, Intuitiveness Ubiquity Doesn’t hinder your creativity
Process modeling software (e.g., WBM, ARIS) create models with formal underpinnings Allow formal analysis, model checking Process Reuse Process Improvement Traceability with realized executable process
Sound, automatic approach to convert process diagrams to formal process models is essential A bridge between the worlds of diagramming and formal modeling
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Outline
Challenges Ambiguities in diagrams Limitation of existing capabilities
Approach Structure Inference Semantic Interpretation
Empirical Study
Related Work & Future directionsSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Challenges in Diagram Interpretation
Human can interpret different visual cues in drawings to correctly resolve the structure and semantics of the models, but machines cannot do the same!
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Structural Ambiguities
Dangling Connectors : Connectors not glued to shapes at their endpoints
Missing Edge
Missing Edge
Missing Edge
Structural Ambiguities
Unlinked Labels: Text annotations not explicitly part of any shape for node/edge
Semantic Ambiguities
Over-specification: Same semantic conveyed in multiple shapes
Under-specification: Same shape conveys multiple semantics
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Limitation of Existing Tools Popular BPM tools such as Websphere Business
Modeler, ARIS, Lombardi, Telelogic System Architect, have Visio import capabilities
Create imprecise flow structure when faced with structural ambiguities
Employ a simple mapping (fixed or pluggable) from a set of diagram shapes to a target set of process semantics to interpret semantics Such an approach cannot deal with under-specification Building an exhaustive mapping is painful in presence of
over-specificationSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Approach
Flow Graph
Process Model
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Shapes &
Attributes
Process Diagra
m
Approach
Shapes &
Attributes
Flow Graph
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Process Diagra
m
Process Model
Edge Inference
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
A B
C D
A B
C D
Edge Inference
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
A B
C D
C1
C4
C2C3
C8C5C6
C7
Uses notion of connection points created at node – line and line – line intersections
Assign direction to connection points
Starting at connection points attached to nodes, propagate their directions along paths in which the directions are consistent and identifies the reached nodes
Create edges if connection point at reached node has a different direction
SRC
SRC
TGT
TGT
TGT UNK
NEU
NEU
SRC SRC SRC
SRC
SRC
Approach
Shapes &
Attributes
Process Diagra
m
Flow Graph
Process Model
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Semantic Interpretation
Train a classifier to mimic human reasoning to decide process semantics
Features used for classification: Relational: Indegree, Outdegree, Count of
nodes contained within Geometric: Shape name, Count of
horizontal, vertical, diagonal lines Textual: Count of cue words for every
target entitySeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Supervised Method
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
{Nodes, Edges}
Annotated by Features
Structure Inference
Flow Diagram
s
{Nodes, Edges}Annotated by
Features + Process Semantic
An expert labels all nodes & edges in the input set of diagrams by their semantics
Classifier establishes correspondence between the features and labels for process semantics
Classifier
Unsupervised Method
Structure Inference
Flow Diagram
s
{Nodes, Edges}Annotated by
Features + Process Semantic
An expert looks at exemplars from each cluster to label process semantic of the cluster
Classifier
{Nodes, Edges}
Annotated by Features
Clusterer
Cluster A
Cluster B
Clusters have
common semanti
cs
Cluster A = Semantic X
Cluster B = Semantic Y
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Empirical Study
Data Set: 185 Visio process diagrams created in real business-transformation projects
Objective: Compare accuracy of our tool iDISCOVER and a popular modeling tool (called PMT for proprietary reasons)
Method: Compare tool outputs with models created manually by human experts to measure precision & recall
Precision = |Actual ∩ Retrieved| , Recall = |Actual ∩ Retrieved|
|Retrieved| |Actual|September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Evaluation - Structure Inference
Element iDISCOVER PMT
Precision Recall Precision Recall
Node 96.93 95.91 70.44 86.29
Edge 93.26 90.86 63.43 59.87
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Ambiguity Instances per File % Files with ambiguityHigh Average
Dangling Connector
47 (100%) 3 (14%) 56%
Unlinked Labels 46 (39%) 2 (3.7%) 38%
Count of dangling connectors has a greater correlation with the edge recall of PMT(ρ = −0.48) than with the edge recall of iDISCOVER (ρ = −0.08).
Evaluation – Semantic Interpretation
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
• Our Precision (Overall Δ ≈30%) and Recall (Overall Δ ≈20%) for all process semantic classes are greater than that of PMT. •Unsupervised is almost as good as supervised
•Accuracy is low only for scarce entities like Intermediate Events and Data Objects (together less than 3% of the data set)•Better results possible with a more equitable distribution of entitiesSize of the training data need not be huge. Classification could
work almostas well with only a third of the dataset size
Related Work
Large body of work in the area of understanding line drawings and hand sketches (e.g., Futrelle, Gross, Barbu) Focus on identifying shape geometry Semantic interpretation follows directly
from a fixed mapping between shape geometry and target semantics
Visual Language theory prescribes geometry detection with grammar rules.
September 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Future Work
More efficient modeling of textual cues Text is the only reliable feature in highly ambiguous
scenarios
Tracking spatial patterns of shapes and labels that emerge due to local styles
Identification of higher-level relations (block structures) between model entities (e.g., sub-process, loop, and fork-merge)
Extend to other diagram typesSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA
Conclusion
Informal process diagrams contain structural and semantic ambiguities – need to be dealt with in order to discover precise formal models
Existing capabilities are limited because: Do not resolve structural ambiguities Interpreting semantic based on shape name does not suffice
Standard pattern-classification techniques can be successfully employed in interpreting process semantics if the feature space is carefully modeled to mimic human reasoning Unsupervised clustering can almost match supervised
techniques in performanceSeptember 14 ,2010, International Conference on Business Process Management, Hoboken, NJ, USA