Master Data Governance
A systematic approach to managing enterprise data assets
White Paper
April 2014
Contents
Executive Summary..............................................................................1
The Problem .........................................................................................2
Master Data Governance Approach .....................................................4
Master Data Governance (Example) ....................................................7
Use Cases ..........................................................................................21
Global IDs’ Technology ......................................................................23
Company Overview ............................................................................24
Contact Information ............................................................................25
© Global IDs Inc. 2001-2014
1
Executive Summary
Master Data
Governance
Every year, corporations spend large sums of money on their IT environments. Much
of that money is spent on making sure that each business unit has the data to function
properly.
In spite of these large expenditures, most CIO's will readily admit that their enterprise
environments lack any form of data governance. Other than financial data, there are
few explicit guarantees that enterprise data is reliable and trustworthy.
Can any business operate without formal checks and balances?
If an organization's data assets are critical to its long term success, shouldn't there be a
systematic set of controls and audits? Shouldn't there be adequate checks and balances
to ensure that the data that exists within the enterprise is reliable. Shouldn't the CIO be
assuring executive management that the information within the enterprise has gone
through rigorous quality control tests?
As it turns out, most organizations are satisfied with "good enough" data. As long as
business operations are functioning sufficiently well and the number of "defects" is not
too high (i.e. can be handled by the IT staff), there is not much cause for worry.
This lack of systematic governance has insidious effects that are not immediately
observable. Over time, these defects accumulate and decay kicks in. The entire
environment has to be continuously fixed with a variety of "band-aids". The "quality
debt" that has been accumulated over many years is eventually repaid through
expensive remediation projects or increasing IT staff size.
This white paper suggests an alternative path.
It describes a systematic way of managing enterprise information that can dramatically
reduce the "quality debt", by building in automated quality controls in all data
management operations. This approach can reduce costs while increasing the
efficiency of the IT environment.
In the following sections we will describe:
1. How the complexity of IT systems creates growth in costs.
2. Why master data governance is key.
3. A structured methodology for master data governance.
4. What quantifiable benefits can be derived?
© Global IDs Inc. 2001-2014
2
The Problem
"Big Data" =
"Huge Costs"
Large organizations, today, have hundreds or thousands of databases and applications.
Creating business value from this large amount of data is no longer an easy task.
The advent of "Big Data" -- the deluge of information originating outside the bounds
organization -- represents both an opportunity and risk for large organizations. While
these new types of data are potentially valuable to business, the cost of processing this
information can be exorbitant.
Inundated by this data deluge, organizations are spending increasing amounts of
money to manage their data in traditional ways -- asking people to organize the
information, and relate it to business value. As the cost of data management keeps
going up, business managers need to understand whether there are alternative
approaches to managing data.
Cost is proportional to the lack of data transparency
One of the primary contributors to information management cost is the lack of data
transparency in large organizations. Put simply, no-one understands the data assets of
the organization at an enterprise level, because all the data is opaque and "locked
away".
This lack of transparency manifests itself in a variety of ways:
1. Costs associated with data integration
Since the systems that need to be integrated were not developed with an enterprise view in mind, bringing these systems together is costly and time
consuming.
2. Costs of data migration
Moving data from legacy systems (e.g. Mainframes) to modern open systems can be very difficult, because the lack of transparency leads to lack of
standardized data across enterprise databases.
3. Significant costs of maintaining data quality
The lack of visibility into enterprise data, makes the measurement and assurance of data quality difficult and expensive
Some of the above costs can be avoided, if the data environment is rendered
transparent. If transparency can be created across the data landscape, without
compromising on data security or data privacy, many redundancies can be detected
and exposed. This, in turn, can lead to further identification of inefficiencies in
business processes.
© Global IDs Inc. 2001-2014
3
Cost is
proportional to
complexity
During the evolution of organizations over many decades, the complexity of IT
environments has grown exponentially. With periodic mergers and acquisitions, there
is a trend towards further increase in complexity.
With increasing complexity, it becomes increasing difficult to leverage operational,
financial, sales, marketing and external data. Turning raw data into business insight
can become time-consuming and pain-staking.
The primary contributors to large-scale complexity are:
Heterogeneity (the degree of differentiation between enterprise databases).
Environment Heterogeneity
Differences in hardware platforms, operating system environments, network
protocols, and different database management platforms.
Volume Data Volume is increasing exponentially To further complicate the matter, companies are getting inundated with
different types of data every day, as the data (both structured and
unstructured) grows.
Data Complexity Complexity due to the number of tables (thousands), number of schemas, customization, errors etc.
Metadata Complexity Differences in data models, data types, schemas, semantics, standards and
language.
Systematic Governance is Key to reducing cost
Most efforts to reign in cost of IT environments is manifested through targeted cost
reduction projects, either by headcount reduction or decommissioning of aging
systems. While this targeted approach has the desired short-term impact, it rarely fixes
the long-term problem, associating increasing complexity with increasing costs.
Just as systematic governance of financial data has created transparency and
accountability, while preventing inefficiency and waste, we believe that systematic
governance of core business data can yield a plethora of benefits.
In this paper, we are suggesting that a systematic governance program can reduce the
long-term costs of managing complex data environments. Our focus will be Master
Data Governance, i.e. The systematic and automated management of critical business
data assets ("Master Data")
© Global IDs Inc. 2001-2014
4
Master Data Governance Approach
What is Master
Data?
Master Data describes the core data assets on which the business of an organization
runs.
For commercial organizations, master data includes data on customers, products,
employees, suppliers, locations, legal entities and about ~ 100 other critical data
objects.
For non-commercial institutions, master data includes data about people,
organizations, partners, vendors, regulators etc.
Our approach Global IDs uses a number of specialized applications to automate the analysis of the
master data environment inside large organizations.
We use a systematic and automated approach to
a) Create transparency within the data environment
b) Create quality assurance monitors on the data environment
c) Establish a master data governance portal for data stewards
Stage 1: Transparency Processing
Global IDs software first uses a 4 step process to perform core analysis on the data
That is found inside an organization’s databases.
Step 1: Data Discovery
The software scans repositories of data in the systems environment.
Step 2: Data Profiling
The software analyzes the data in both databases and content repositories.
Step 3: Data Classification
The software determines all the patterns that exist in the environment, and
categorizes the data
Step 4: Data Mapping
The software connects all similar data objects together, creating enterprise
maps of business objects (e.g. Customer, Product, Employee, Vendor etc.)
Stage 2 : Next, the Global IDs software uses a 4 step process to perform core quality analysis
© Global IDs Inc. 2001-2014
5
Quality
Processing
on the data that is found inside each database that the software has encountered.
Step 5: Data Verification
The software applies quality rules against known categories of data domains.
Step 6: Data Validation
The software applies multi-column business rules to validate the data.
Step 7: Data Stewardship
The software establishes a RACI matrix, to determine who is responsible and
accountable for the quality of master data.
Step 8: Data Monitoring
The software continuously monitors the data landscape to ensure that master
data conforms to data integrity standards across the whole enterprise.
Stage 3 :
Portal Generation
Global IDs software then generates a web portal for each master data object, so that
data stewards can interact with core business data.
Step 9: Web-Portal Generation
A portal is generated for each of the master data objects that have been
automatically mapped in Step 4
Step 10: Web-Portal Customization
Since the individual needs of each organization are different, the software
is customized to meet the needs of each individual data steward.
Stage 4:
Portal Operations
To make the Portal valuable to the Data Steward for the daily operational use, the
Master Data Governance Portal provides the following functionality
1. An Enterprise View of each Master Data Object
(Showing the distribution of master data across all databases within the organization) 2. Enterprise Search
(Providing the ability to search for specific master data within the organization) 3. Audits
(Showing the conformance of master data with data quality rules) 4. Business Rules
(Showing the business rules that pertain to the master data object) 5. Controls
(Showing active controls, violations, exception conditions etc.) 6. Analytical Reports
(Showing multiple reports with the results of different types of analysis on master data) 7. Metrics
(Showing business metrics on master data - growth, change, quality improvements, etc.)
8. Policies and Procedures (Showing the documented policies of the organization, and actual conformance)
9. Domain Analysis: Attributes, Code Tables, Subtypes (Analysis of each attribute, code set, and subtypes belonging to the master data object)
10. Hierarchies
(Showing business-relevant hierarchies built on the master data object)
The above list shows the default functionality that is available in the Global IDs
portal. The software permits customization and can be tailored to individual users.
Results Using the above approach, each business data object that is important to the
functioning of the organization, can be systematically governed, and business data
stewards can ensure that conformance to quality controls are being continuously
monitored.
© Global IDs Inc. 2001-2014
6
Some examples of Master data objects (in multiple industries) that could be subject to
Systematic Master Data Governance are shown below
Educational Publishing Company 1. Customer 2. Product
3. Sales Representatives
4. Author
5. Printer
6. Bookstores
7. ...
Financial Services Company 1. Client
2. Account
3. Security
4. Counter-Party
5. Registered Representatives
6. ...
Healthcare Company 1. Member 2. Provider
3. Hospital
4. Policy
5. Claim
6. ...
Chemicals Manufacturing Company
1. Customer 2. Product
3. Distributer
4. Regulator
5. Supplier
6. Plant
7. …
Pharmaceutical Company 1. Customer 2. Product
3. Patient
4. Regulatory Agency
5. Pharmacy
6. Doctors
7. ...
The Global IDs software comes pre-loaded with multiple master data objects, and
hundreds of known master data domains. The software can be extended to other
master data objects that are relevant to a specific organization.
© Global IDs Inc. 2001-2014
7
Master Data Governance (Example)
Scenario The details behind the approach can be described by applying the approach to a real-
world problem.
Most organizations make it a priority to organize their customer master data, since it is
directly related to their revenue stream. The organization's core applications maintain
accurate records of the customer data, by ensuring all the following types of data are
verified by customer support representatives.
Customer Master Data
1. Customer Identifier 2. Customer Name
3. Customer Shipping Address
4. Customer Billing Address
5. Customer Phone Number
6. Customer Email Address
7. Customer Website Address
8. Active Customer Flag
9. ...
Initially, when the organization is small, maintaining this type of customer data is
easy. As the organization evolves, entropy sets in.
The evolution of
Customer Data
As the organization develops, either through growth or acquisition, the number of
applications and databases that hold customer information increase, with the
“customer data landscape” becoming larger and more complex. Over time, the
sheer volume and complexity of the data forces the databases to fall out-of-sync,
and numerous inaccuracies start creeping into the customer master data.
In the case of large companies that have evolved over decades, the problems are even
more pronounced. Reasons include:
1. There are a large number of databases, and no-one is certain where all the
customer master data resides.
2. The databases are spread out geographically, with limited oversight or
control. Different country databases could be using different identifiers for
the same customer.
© Global IDs Inc. 2001-2014
8
3. The customer data entry systems are diverse, and data is often input without
ensuring that the data is accurate or complete.
4. Due to data flow across different databases, across different interfaces, and
with intermediate transformations, it becomes increasingly difficult to trace
which downstream systems are consuming the customer master data.
5. Knowledge of customer master data is dispersed across different business
units, and no-one is accountable for the customer data, at the enterprise level.
Due to these realities, there are multiple versions of customer master data across the
organization, but no “single version of the truth”. With the underlying databas
es changing all the time, no-one can understand the diversity of customer master
data that the enterprise level. In the majority of organizations, no single person is
accountable for the customer data.
Once the process of maintaining centralized customer data becomes costly and time-
consuming, the organization distributes the responsibility which creates multiple silos
of customer data that do not have consistent checks and balances. Each silo starts
diverging, and inaccuracies starts creeping into the customer data environment.
The Business Impact of “Bad” Customer Data
The divergence in the quality and integrity of customer data initiates a series of events
that annoys the customer, and degrades the relationship. Common events that lead to
customer dissatisfaction include
Existing customers are sent invitations to become new customers
(the marketing databases have not synchronized with the active customer
database)
The same customer is solicited multiple times in a single mailing campaign.
(the marketing database have not been cleansed of duplicates)
The number of errors in shipping and billing increases, due to inaccuracies in address information
(Data quality errors in customer information systems)
Even after a customer corrects inaccurate data, the errors persist.
(multiple customer data silos with redundant information)
When customers start noticing problems with their information, business problems
start to manifest themselves. Each encounter with the customer that leads to a negative
perception, degrades the customer relationship and eventually leads to the loss of the
customer to a competitor.
How to improve customer data quality
A systematic Customer Master Data Governance Program can prevent degradation of
customer data quality by putting controls around enterprise customer data.
Given the enormity of tasks related to maintaining enterprise customer data, manual
approaches to enterprise data quality are often infeasible. It is important to
emphasize, that the controls must be automated through software that continuously monitors the data.
A software-centric methodology to Customer Master Data Governance can
systematically place quality controls on the data environment. The step-by-step
methodology that is used by Global IDs is described below.
© Global IDs Inc. 2001-2014
9
Step 1:
Customer Data
Discovery
In the first step, Data Discovery software systematically "scans" the environment, to create an inventory of all data and information assets.
Since most customer data resides in structured databases (i.e. Relational databases) a
structured data scan can be performed on enterprise databases
Structured Database Scan The software is used to connect to relational databases and extract the metadata from the database.
Scanning is supported for the following most relational database platforms,
such as
Oracle
DB2 on Unix
DB2 on Mainframe
MS SQL Server
MySQL
Sybase
Teradata
Apache Derby
(other relational databases with Type 4 JDBC drivers can also be supported)
For relational databases the scanning processes can be parallelized, and
scans can be completed in hours or days instead of weeks or months.
It is important to conduct the scans, such that
The load on the database is minimum, and there is no performance
impact on the database.
Data security policies are adhered to. Scanning is only performed
for those tables where authorization has been granted.
There is no change to the database after the scan is completed. This
is achieved by READ-ONLY ACCESS permission.
Figure 1 : Shows Database Instances and Schema in Data Landscape
© Global IDs Inc. 2001-2014
10
Step 2 :
Customer Data
Profiling
In the second step, software can be used to systematically "profile" and analyze the
data that is present in the enterprise data landscape to understand the patterns that are
present in the data.
Profiling is performed for all accessible
Schemas
Tables
Columns
Clusters
This activity results in a detailed analysis of the content of each database and provides
a deep understanding of the data environment. It allows the user to identify all the
customer data tables in the databases that have been scanned.
Multiple types of profiling activities should be performed.
Profiling Type Value of Analysis
Pattern Mining Find pattern deviations
Domain Profiling Recognize domains. Understand quality rules. Identify deviations.
Relationship Profiling Understand undocumented relationships. Identify orphans.
ID Profiling Recognize identifier columns and composite keys
Statistical Profiling Find statistical outliers
Hierarchy Profiling Find hierarchical relationship across data columns
Distribution Profiling Find distributions of values
Sub-Type Profiling Recognize subtypes and subtype descriptions. Find unusual subtypes
Sub-Table Profiling Partition table by subtype. Profile each sub-table. Compare profiles.
Quality Profiling Compute quality metrics
Numeric Profiling Compute standard deviations. Find outliers.
Time Series Profiling Compute time series metrics. Identify outliers.
Duplicate Profiling Understand Record duplication ( multiple matching algorithms )
Dependency Profiling Understand dependencies between columns in a table
Cluster Profiling Profile subsets of columns (clusters)
Dynamic Profiling Profile current data. Compare results with previous snapshots
Records Profiling Analyze which records satisfy quality audits (pass/fail)
File Profiling Profile external data files.
Code Table Profiling Recognize codes and code descriptions. Find unusual codes.
Some of the key types of data profiling are described below
1. Pattern Mining
The software is used to analyze each value in each column of data and find the patterns associated with the column.
© Global IDs Inc. 2001-2014
11
For example, the pattern mining of a Customer Phone Number column may
give the following patterns
(nnn)nnn-nnnn
+1(nnn)nnn-nnnn
1nnnnnnnnnn
nnn.nnn.nnnn
….
Erroneous data can be identified by the presence of erroneous patterns
2. Domain Profiling
The software is used to analyze all the values in each column of data, to determine whether it belongs to a known data domain.
Domain profiling results in the recognition of many of the columns in the
database tables. The types of domains that can be recognized include:
Global Domains
(e.g. Names, Addresses, Phone Numbers, Emails, URLs, Dates,
Countries, Zip Codes etc)
Business Domains
(Identifiers associated with Customers, Products, Employees etc. )
Industry Domains
(domains that are specific to a particular industry,
Healthcare:Patient_IDs, Healthcare:Provider_IDs etc)
Global IDs comes with hundreds of domains predefined inside the software.
3. Relationship Profiling
The software is used to understand the relationships of the tables to each other.
The profiling results can be used to understand
Explicit Relationships ( primary key – foreign key constraints)
Implicit Relationships ( undefined relationships across key fields)
4. ID Profiling
The software is used to analyze the columns within each table and detect the identifiers that are present.
Many types of Identifiers can be detected.
Primary Keys
Natural Keys (non-unique)
Composite Key Combinations
Surrogate (Keys)
5. Statistical Profiling
The software is used to analyze the columns and compute statistical metrics
related to the data.
For descriptive columns, a variety of completeness metrics are computed.
Number of Nulls
Number of Blanks
Number of default values ( NA, TBD, N/A etc)
Number of duplicates
© Global IDs Inc. 2001-2014
12
For numeric columns, a variety of statistics are computed.
Minimum, Maximum, Average
Standard Deviation
Most frequent occurrence
6. Hierarchy Profiling
The software is used to analyze the data and determine whether there are hierarchical relationships present in the data.
A variety of hierarchy types can be detected.
Subtype-based hierarchies
Recursion-based hierarchies
Rule-based hierarchies
7. Distribution Profiling
The software is used to analyze the frequencies of values within columns to come up with a distribution.
Different types of distributions can be detected.
Length Distributions (within columns that contain text)
Value Distributions (within columns that contain duplicate values)
Numeric Clusters (within columns that contain numeric data)
The distributions can be used to detect outliers in the data sets, locating potential
data quality errors.
8. Quality Profiling
The software is used to determine quality related metrics.
Completeness
Duplication
Consistency
Conformity
Integrity
Trust
9. Dependency Profiling
In this type of profiling, the dependence between each column is determined by the data. This analysis is useful to find common dependencies across
columns, and then find deviations from identified dependencies.
10. Subtype Profiling
The software is used to determine which subtypes are present in the data.
Subtypes include fields like
CUSTOMER_TYPE
PRODUCT_TYPE
PRODUCT_SUBTYPE
11. Numeric Profiling
The software is used to analyze the values of numeric columns, and determine the following metrics.
© Global IDs Inc. 2001-2014
13
Whether there are any numeric outliers in the data. Outliers are computed
on the basis of standard deviations from the mean.
Whether the numeric data is associated with any units.
Whether there are dependencies across numeric columns.
12. Time Series Profiling
The software analyzes time-series data to determine whether the data conforms to the norm.
A variety of time-series metrics are computed.
Unusual changes in the time-series data (value domain)
Unusual changes in the time-series data (frequency domain)
Figure 2 : Results from Statistical Profiling
Step 3 : Customer Data Classification
In the third step, the Global IDs software tries to systematically “recognize” a
ll the fields that contain customer identifiers. By examining all the patterns that have
been mined in the previous step, the software is able to determine which customer
IDs are found in which tables in which databases.
Effectively, what this classification step does is to “organize” all the customer
data that is found across a complex data landscape, and establish the distribution of
customer data across the enterprise.
For example, the software is likely to recognize
Different types of customer IDs (keys) that are used across the
enterprise
Whether customers are organizations or individuals.
Natural key fields like Customer Name, Customer Address, Customer
Phone numbers, Customer URLs, Customer Email Addresses
© Global IDs Inc. 2001-2014
14
Through the patterns that have been mined previously, the software is able to
recognize the rules related the format and pattern of the customer IDs.
As a result, the software is able to isolate customer identifiers that do not follow the
rules, and detect these outliers.
Even in the presence of "dirty" data or "ambiguous" data, the software is robust
enough to identify Customer IDs. Consequently, it works under situations where there
is
• No enterprise naming standard
CID, CUST_ID, CNUM
• No standardization of abbreviations
CUST, CUSTOMER, CUSTOM
• Non-standard names that have been used by different DBAs
CUSTOMER_IDENTITY, CID, CUSTOMER_IDENTIFIER
• Typos in column headers
CSTOMER_IDENTTY_NUMBR,
• Custom Rules can be included by users to improve the recognition logic.
Figure 3 : Results of Classification Step showing distribution of domains.
Step 4: Customer Data Mapping and Lineage
Now the software is in a position to create a map of all customer data across the
enterprise. This map adds attribute information, to the customer identifiers that have
been previously mapped.
Global IDs uses a technique called “semantic equivalence mapping” to establish the
Customer Data Maps. The software uses the output of the profiling analytics to
understand the semantic content of each column of data, before attempting to map
equivalent semantic domains.
© Global IDs Inc. 2001-2014
15
The software is able to create a comprehensive picture of the customer entity, and create a fairly comprehensive map of customer data. Using this map, the organization
can trace the presence of customer data across the data landscape.
It should be noted that these types of customer data maps are extremely costly and
time-consuming to create, especially for complex data landscapes. By automating the
generation of the Customer Data Map, the software reduces the level of manual effort
that is required for understanding customer data in a comprehensive way.
Figure 4 : Sample Customer Map showing database tables mapped to canonical form.
Step 5: Customer Data Verification
In the next step, the software tries to systematically verify each customer data domain
using the rules that it has detected in the profiling step. The types of rules that have
been harvested includes.
Format Rules
Length Rules
Pattern Rules
Data Type Rules
Data Encoding Rules
Reference Value Rules
Substring Rules
Special character Rules
Language Rules
By collecting additional rules from the user community, the software is able to create
a comprehensive rule-base to establish the constraints that each domain is allowed to
have.
For example, the Customer ID domain may have the following rules associated with
it.
Format Rule = nnn-nn-nnnn
Length Rules = fixed length 11
© Global IDs Inc. 2001-2014
16
Pattern Rules = 3[1-9]-2[0-9]-3[0-9]
Data Type Rules = varchar
Data Encoding Rules = ASCII
Reference Value Rules = none
Substring Rules = none
Special character Rules = “-” allowed
With these rules in place, all Customer IDs in the data landscape can be examined for
compliance with this rule-base.
Similarly, separate rule-bases can be created for each of the Customer Data Domains.
Customer Identifier
Customer Name
Customer Shipping Address
Customer Billing Address
Customer Phone Number
Customer Email Address
Customer Website Address
Active Customer Flag
These types of “single-column” rules can be automatically established using software.
Figure 5 : Results of data validation showing errors identified by software
Step 6: Customer Data Validation
Given that the software does not understand the business context, it becomes
important to capture the business domain knowledge that is possessed by the user
community.
In data validation step, the user community adds known business rules to the customer
rule base.
© Global IDs Inc. 2001-2014
17
For example, the Customer Type domain and the Customer Subtype domain may have a hierarchical parent-child relationship associated with it.
If CustomerType = Active
then CustomerSubtype is not null
These types of “multi-column” relationships are difficult to detect automatically
Hence, the participation of the user community is required to establish these business
rules.
Collectively, the “single-column” rule-bases, and the “multi-column” rule-bases
form the Customer Rule Base, and establish the constraints under which all customer
master data must comply.
The advantage of this approach is that it provides a comprehensive set of rules for
customer data. An audit of customer master data is now possible, and all violations of
these rules across the data landscape can be systematically rooted out.
Figure 6 : User Interface for establishing controls based on Business Rules.
Step 7: Customer Data Stewardship
In the Data Stewardship layer, the software tries to establish accountability for data
quality, by associating one or more people with responsibility for Customer Master
Data.
The Global IDs software allows assignment of customer data domains with a RACI
matrix, showing who is responsible, accountable, consulted and informed. These data
stewards become responsible for compliance with PII policies.
The Data Stewards in the Master Data Governance team are responsible for
Creating controls
Generating compliance reports and compliance metrics*
Running periodic audits
With an accurate and comprehensive understanding of the distribution of Customer
Master data in the enterprise, the governance team can take steps to ensure that
Customer Master Data can be trusted to have accurate information
© Global IDs Inc. 2001-2014
18
Customer Master Data has gone through systematic checks and balances
Customer Master Data quality is being measured periodically
Customer Master Data can be integrated in a Master Data Hub.
Figure 7 : RACI Matrix for Database Schemas
Step 8: Customer Data Monitoring
Since the data in enterprise databases is continuously changing, the quality of this data
can degrade over time. The software prevents this degradation by continuously
monitoring the Customer Master Data in the data landscape.
By periodically auditing the data environment, and ensuring that compliance status of
each data source is improving over time, an effective governance process can be put in
place.
The software can be used to periodically re-run the scans and generate compliance
metrics for Customer Master data in each database schema
As additional new environments are added to the enterprise data landscape, those
environments can be included in the compliance monitoring process.
© Global IDs Inc. 2001-2014
19
Figure 8 : Monitor of Record Count Changes in Customer Tables
Step 9: Customer Data Governance Portal Generation
To make the governance of Customer Master Data intuitive and user-friendly, the
Global IDs software generates a Customer Master Data Governance portal. The web-
based portal allows data stewards and business users to carry out their operational
activities easily.
The following activities can be supported through the Customer Data Governance
portal.
1. An Enterprise View of Customer Master Data
(showing the distribution of customer master data across all databases within the organization)
2. Enterprise Search (providing the ability search for customer master data within the organization)
3. Audits (showing the conformance of customer master data with data quality rules)
4. Business Rules (showing the business rules that pertain to the customer master data object)
5. Controls (showing active controls, violations, exception conditions etc.)
6. Analytical Reports (showing multiple reports with the results of different types of analysis on customer master data )
7. Metrics (showing business metrics on customer master data - growth, change, quality improvements, etc.)
8. Policies and Procedures (showing the documented policies of the organization, and actual conformance)
9. Domain Analysis: Attributes, Code Tables, Subtypes (analysis of each attribute, code set, and subtypes belonging to the customer master data object)
10. Hierarchies
(Showing business-relevant hierarchies built on the customer master data object )
© Global IDs Inc. 2001-2014
20
Figure 9 : Portal Designer Module showing configuration of Data Governance Portal
Step 10: Customer Data Governance Portal Customization
To make the governance of Customer Master Data intuitive and user-friendly, the
Global IDs software generates a Customer Master Data Governance portal. The web-
based portal allows data stewards and business users to carry out their operational
activities easily.
The following customization activities can be supported :
1. Portlet Creation 2. Graph Generation 3. Analytical Reports 4. Compliance Metrics
Figure 10 : Sample of auto-generated Customer Data Governance Portal, awaiting
customization.
© Global IDs Inc. 2001-2014
21
Use Cases
Why govern
Customer Data?
In most organizations, Customer Data is the key to all revenue generating activity.
A systematic governance process for customer-related data assets may be able to
improve the quality and reliability of customer data across the enterprise data
landscape.
Some possible use cases are described below.
Use Case 1: Creating a 360- degree view of customer relationships
Many organizations aspire to create a 360-degree view of their customer relationships.
In other words, there is a desire to gain an understanding of all interactions with the
customer, thereby generating a “holistic” view. Gaining this view allows the
organization, to understand how to improve each customer relationship, thereby
increasing both revenue and profit.
Given that customer data is distributed across a large number of applications and
databases, and given the heterogeneous nature of the enterprise database environment,
it is likely that there are many discrepancies in the customer data across the databases.
The pattern mining / domain classification approach can be used to map customer data
across the enterprise data landscape, and create systematic governance processes on
this data, thereby generating this holistic view.
The equivalent steps for this effort would be
1. Discovery of Enterprise Databases
2. Profiling of Structured and Unstructured Data for Customer IDs.
3. Domain Classification of Customer IDs
4. Object Mapping of all Customer identifiers
5. Identification of Customer Data Stewards : RACI matrix creation
6. Customer Data Quality Monitoring and Governance
Use Case 2: Customer Master Data Integration
Organizations need a trusted and reliable source of customer data.
In reality, most organizations have multiple sources of customer data in different
databases. These databases are usually out of sync, and each database contains
© Global IDs Inc. 2001-2014
22
information that is partially correct. As a result, it is difficult to maintain a “master” file that contains the “system of record”.
Many factors make it difficult for most organizations to maintain accurate and timely
customer information.
continuous mobility (address changes, contact phone number changes)
continuous variation in identification (name changes, email changes)
occasional variation in status ( changes in credit worthiness)
variation in personal circumstances (financial status, household status)
Unless the organization is continuously enriching its databases with these types of
information, discrepancies between the real-world data and the stale data in the
database will be inevitable, and discrepancies across databases will be inevitable.
The master data governance approach, coupled with a data enrichment strategy can be
used to create a Customer Master Data Hub and systematic governance of high quality
customer data.
Use Case 3: Customer Data Quality Measurement and Audits
Given the importance of the customer data it is critical that all such data is
systematically audited, on a periodic basis.
To ensure that the quality and accuracy of the data is continuously improving, the
audits should be conducted fairly frequently, and the quality metrics should be
monitored periodically by the data stewards. A continuous-running software process
that runs in the background, and systematically audits the quality of customer data is
an effective, low-cost way of achieving these objectives.
Use Case 4: Customer Data Enrichment for improved marketing
Marketing data is often customer-centric, and needs to be enriched with demographic
information to improve segmentation strategies.
The ability to automatically enrich customer information with external data can
empower an organization to be better aware of the customer's circumstances, and
target marketing efforts with greater focus.
The approach described here, along with automated customer data enrichment
strategies can be used to improve the effectiveness of marketing campaigns.
Use Case 5: Mergers and Acquisitions
Most large organizations go through periodic merger and acquisition activity. As a
result, the customer data landscape is continuously changing and new sets of customer
data must be integrated into existing master data repositories.
The methodology in this white paper, allows organizations to adapt their customer
data governance framework to help during M&A activity. For example, it can greatly
reduce the cost of customer data integration activity between the merged companies,
while increasing the speed of CDI (“Customer Data Integration”) projects.
© Global IDs Inc. 2001-2014
23
Global IDs’ Technology
Global IDs
Platform
Global IDs’ provides a unique software platform to support the needs of
large organizations that have complex IT environments. The software provides
a broad functional framework to address different types of data management
projects.
Understand and integrate data at an enterprise level
Provide information-on-demand from across the enterprise
Automate the cleansing and validation of corporate data.
Address the complexity of global data ( different languages / standards )
Automate the stewardship and governance of corporate data.
Some of the benefits associated with the Global IDs platform are described below.
Enterprise Scalability
Since the software performs complex workflows with a high degree of automation, we
are agnostic about the size or complexity of the enterprise IT environment. As a result,
we have the
Scalability
Our software uses hands-off computation processes to handle complex
workflows that can span multiple data sources. From the software perspective,
there is no difference between a single data source or 100 data sources.
Robust
Performance
The software platform uses
Massively Parallel Processing (MPP)
Performance can scale with hardware availability (CPU / memory resources).
Fault Tolerance
The software has been designed to exist in a distributed computing
environment, and has persistence built into the platform.
Security Global IDs software prioritizes compliance with enterprise security standards.
© Global IDs Inc. 2001-2014
24
Company Overview
Global IDs was founded in 2001.
The company was created specifically to address the problem of large scale information
integration. Our mission is to develop agent-based integration solutions that can
accomplish integration across hundreds of systems.
The company created the world’s first data integration software that employs
intelligent mobile agents for large-scale integration tasks. Our solutions make it
possible for large global companies to address the integration tasks that are required to
improve business performance analysis.
Our client portfolio includes major Fortune 2000 companies.
25
Contact Information
For further details on Global IDs products and services, please contact
us via email or telephone at:
Global IDs 184 Nassau Street Princeton, NJ 08542, USA
(646) 201- 9498
www.globalids.com
© Global IDs Inc. 2001-2014