scc talk

© 2009 IBM Corporation

Organizing Documented Processes

Biplav Srivastava

Debdoot Mukherjee

IBM Research, India


SCC 2009, Organizing Documented Processes

2 23 Sept, 2009

Research Theme

Establish an effective framework for organizing design-level documentation on business processes and linked business artifacts in order to:

– Boost information reuse across engagements– Maintain coherence in enterprise process repositories– Reduce costs and improve quality in business transformation exercises

Setting: Enterprise Resource Planning Projects– Off-the-shelf software to manage common

business functions (e.g. Finance, Supply Chain)– Businesses buy these software and then engage

service providers to tailor them– AMR Research estimates that spending on consulting,

integration and support for packaged application services was $103B in 2007, and expected to reach $174B by 2012



3 23 Sept, 2009

Motivation

Blueprinting is the crucial activity in ERP projects where the details are decided about how the ERP functionality will be used and any new customizations will be implemented

Documented business processes and related artifacts are the key outputs of blueprinting

Business Processes are captured in large numbers and in multiple representations

– Typically over 100 business processes per engagement – Flow Diagrams: Visio, PowerPoint– Text Documents: Word, Excel

Effective reuse of process information from past engagements will yield great benefits

– Conventional document management systems are not capable of providing a process-centric view of information

– How to search for the most effective business artifacts in the current “process” context?



4 23 Sept, 2009

Related Literature

Work in measuring similarity (diagnosing differences) in business process models

– e.g., Ehrig et al (APCCM ’07), Dijkman (BPM ’08), Van der Aalst et al (BPM ’06)– Compares flow models in structured formats viz. Petri net, EPC, YAWL– Linguistic, semantic and structural dimensions of comparing process elements

Extensive literature in Process Mining from execution logs – ProM framework

Research on choosing an appropriate granularity of process model reuse– Holschke et al (BPM ’09), Mendling et al (BPM ’08)

Extraction and management of useful process variants (Sadiq BPM ’06) Traditional methods in legacy text mining and organization

– But they do not specifically focus on process information No known effort to target design level process information with

linkage to business artifacts of interest viz. requirements, KPIs, use-cases



5 23 Sept, 2009

Key Information Elements

Business Process HierarchiesIndustry SpecificCross Industry

Process Specific ArtifactsScenarioProcessProcess StepInputs, Outputs

Non-Process Business ArtifactsRequirementUse-caseGapKPI



6 23 Sept, 2009

Data, Data Everywhere... Nor Any Drop to Use!!

Design information on business artifacts implemented in engagements are locked in documents

–Need to turn them into reusable assets–Retrieve information into a model based format

Enterprise asset repositories are not well organized–Essentially, a dump of unlinked process documentation in different formats

– No meta-data available against silos of documents Inconsistencies in process data

– Multiple teams are responsible for various aspects of process design



7 23 Sept, 2009

Process Organization & Reuse

Extract model based content

Enterprise repositories

Process Organization Framework

Content Reuse

Duplicate Detection



8 23 Sept, 2009

Process Information Extraction - Text

Utilize semi-structured nature of data Extract content segments present in a document collection, which can map to some process

semantics Seek an appropriate tag (preferably from a pre-defined meta model) from the user Utilize layout of content segments in the document to establish cardinality and relations

between various pieces of flat tagged content

Extract Tag



9 23 Sept, 2009

Process Information Extraction - Diagrams

General purpose diagramming tools viz. Visio, Powerpoint, Xfig etc. are used to capture business processes. Reasons: Ubiquitous (low cost), Familiarity (intuitive to use)

No formal modeling tool provides sound import capabilities from diagramming formats!! Challenges in Model Discovery

– Ambiguities are commonplace in informal drawings

– Humans can understand intent from visual cues – machine interpretation is hard!

– Dangling connectors, Unlinked Labels, Over-specification, Under-specification Steps in Model Discovery : Flow Structure Extraction, Semantic Interpretation

CreateOrder

ProcessOrder

Order

ShipOrder

CreateOrder

DTS Server Communication host

to and from SAP system

ProcessOrder

Over-specification:

Under-specification:

A

C

B

D

Dangling Connectors:



10 23 Sept, 2009

Problem: Organizing Process Information

Given a dump of business process documentation (both text and diagrams) from an engagement, how to organize them so that information contained in them may be effectively harvested?

Three sub-problems– Problem 1: Link text and visual representation– Problem 2: Normalize content in linked text and visual

forms– Problem 3: Group normalized content in similar clusters

Demonstrate benefit of better organization



11 23 Sept, 2009

Process Information in Text and Visual Formats



12 23 Sept, 2009

Benefits in text:• Process information is detailed

* Problems in text:• Control flow details is lost • Unintuitive, e.g., swim lanes is missing

Benefits in flow:• Control flow is detailed• Intuitive

* Problems in flow:• Names in flow do not match text (Functional FP&A Planner v/s

(FP&A Planner)• Limited information. E.g., whether an activity is system or manual?

Text has the details

Example



13 23 Sept, 2009

Steps in Process Organization

Set of work product (files) describing business processes

Link textual and flow (visual) files

Normalize process step information in linked text and flow

Cluster normalized process information

Clusters of business processeswith linked non-process artifacts

• Enrichment of information• Consistency-Single view of truth• Structured representation

• Name• Description• Role• Predecessors• Successors

• Inputs• Outputs• Nature• Miscellaneous

• Define suitable similarity measures to deal with atomic and composite content• Run a clustering algorithm without apriori information on number of clusters



14 23 Sept, 2009

Input

– 240 Process Definition Documents

– 315 Process Flow Diagrams Linking

Normalization

Empirical Evaluation ― Results

Similarity Measure

Pair-wise Matches

# PDDs Precision (%)

Jaro 126 30 48

Exact 11 11 100

Similarity Measure % Match (Name)

% Match (Name +

Role)

Jaro 37 8

Exact 45.5 13



15 23 Sept, 2009

Empirical Evaluation ― Results (2)

Dataset: A set of 240 Process Definition Documents from an actual ERP project engagement

Number of pair-wise similar processes : 266 Number of clusters found : 23 Range of cluster sizes = (2, 21) Number of processes similar to at least one other process = 134 (i.e., 55% of

total) Effectiveness of discovered clusters in boosting similarity of non-process

business artifacts written in context of business processesArtifact Similarity

inside clusters

Overall Similarity

Similarity Boost (%)

Requirement 0.209 0.014 1430.55

Integration Consideration

0.620 0.115 438.54

Supplier 0.844 0.109 671.22



16 23 Sept, 2009

Application to File Duplicate Detection

Scenario

– Input: 1520 files organized in a complex directory structure, 13 different asset types, files per asset type known

– Problem: Find duplicates or near similar files in an asset type Approach

– Harvest content of files per asset type

– Cluster based on content

– Files in each cluster are duplicates

16

Type # Files #Clusters #Files in Some Cluster

% Unique

PDD 866 116 786 23% (196/866)

BPP 463 121 406 38% (178/463)



17 23 Sept, 2009

Scope for Future Work

Improve precision of text similarity measures–Use domain specific Word Nets–Apply sound aggregation measures for robust relational learning

Build ontologies of ERP concepts and utilize relationships therein to improve search for similar business artifacts in the context of a business process

Extraction of process documentation into standardized representations



18 23 Sept, 2009

Conclusions

Efficient organization of design-level process documentation, which may not have execution semantics, can ease information reuse

Process information can help in searching for useful non-process business artifacts

– e.g., Searching for the correct use-case or performance indicator can be easy if these are maintained along with process information

Enriching and normalizing process information from multiple representations is important

– Removal of duplicate and inconsistent data is critical



19 23 Sept, 2009

Thank You

Extract model based content

Enterprise repositories

Process Organization Framework

Content Reuse

Duplicate Detection

scc talk

Technology