Download - Master Data Governance - · PDF fileMaster Data Governance A systematic approach to managing enterprise data assets White Paper ... Establish a master data governance portal for data

Master Data Governance

A systematic approach to managing enterprise data assets

White Paper

April 2014

Contents

Executive Summary..............................................................................1

The Problem .........................................................................................2

Master Data Governance Approach .....................................................4

Master Data Governance (Example) ....................................................7

Use Cases ..........................................................................................21

Global IDs’ Technology ......................................................................23

Company Overview ............................................................................24

Contact Information ............................................................................25

© Global IDs Inc. 2001-2014

1

Executive Summary

Master Data

Governance

Every year, corporations spend large sums of money on their IT environments. Much

of that money is spent on making sure that each business unit has the data to function

properly.

In spite of these large expenditures, most CIO's will readily admit that their enterprise

environments lack any form of data governance. Other than financial data, there are

few explicit guarantees that enterprise data is reliable and trustworthy.

Can any business operate without formal checks and balances?

If an organization's data assets are critical to its long term success, shouldn't there be a

systematic set of controls and audits? Shouldn't there be adequate checks and balances

to ensure that the data that exists within the enterprise is reliable. Shouldn't the CIO be

assuring executive management that the information within the enterprise has gone

through rigorous quality control tests?

As it turns out, most organizations are satisfied with "good enough" data. As long as

business operations are functioning sufficiently well and the number of "defects" is not

too high (i.e. can be handled by the IT staff), there is not much cause for worry.

This lack of systematic governance has insidious effects that are not immediately

observable. Over time, these defects accumulate and decay kicks in. The entire

environment has to be continuously fixed with a variety of "band-aids". The "quality

debt" that has been accumulated over many years is eventually repaid through

expensive remediation projects or increasing IT staff size.

This white paper suggests an alternative path.

It describes a systematic way of managing enterprise information that can dramatically

reduce the "quality debt", by building in automated quality controls in all data

management operations. This approach can reduce costs while increasing the

efficiency of the IT environment.

In the following sections we will describe:

1. How the complexity of IT systems creates growth in costs.

2. Why master data governance is key.

3. A structured methodology for master data governance.

4. What quantifiable benefits can be derived?


2

The Problem

"Big Data" =

"Huge Costs"

Large organizations, today, have hundreds or thousands of databases and applications.

Creating business value from this large amount of data is no longer an easy task.

The advent of "Big Data" -- the deluge of information originating outside the bounds

organization -- represents both an opportunity and risk for large organizations. While

these new types of data are potentially valuable to business, the cost of processing this

information can be exorbitant.

Inundated by this data deluge, organizations are spending increasing amounts of

money to manage their data in traditional ways -- asking people to organize the

information, and relate it to business value. As the cost of data management keeps

going up, business managers need to understand whether there are alternative

approaches to managing data.

Cost is proportional to the lack of data transparency

One of the primary contributors to information management cost is the lack of data

transparency in large organizations. Put simply, no-one understands the data assets of

the organization at an enterprise level, because all the data is opaque and "locked

away".

This lack of transparency manifests itself in a variety of ways:

1. Costs associated with data integration

Since the systems that need to be integrated were not developed with an enterprise view in mind, bringing these systems together is costly and time

consuming.

2. Costs of data migration

Moving data from legacy systems (e.g. Mainframes) to modern open systems can be very difficult, because the lack of transparency leads to lack of

standardized data across enterprise databases.

3. Significant costs of maintaining data quality

The lack of visibility into enterprise data, makes the measurement and assurance of data quality difficult and expensive

Some of the above costs can be avoided, if the data environment is rendered

transparent. If transparency can be created across the data landscape, without

compromising on data security or data privacy, many redundancies can be detected

and exposed. This, in turn, can lead to further identification of inefficiencies in

business processes.


3

Cost is

proportional to

complexity

During the evolution of organizations over many decades, the complexity of IT

environments has grown exponentially. With periodic mergers and acquisitions, there

is a trend towards further increase in complexity.

With increasing complexity, it becomes increasing difficult to leverage operational,

financial, sales, marketing and external data. Turning raw data into business insight

can become time-consuming and pain-staking.

The primary contributors to large-scale complexity are:

Heterogeneity (the degree of differentiation between enterprise databases).

Environment Heterogeneity

Differences in hardware platforms, operating system environments, network

protocols, and different database management platforms.

Volume Data Volume is increasing exponentially To further complicate the matter, companies are getting inundated with

different types of data every day, as the data (both structured and

unstructured) grows.

Data Complexity Complexity due to the number of tables (thousands), number of schemas, customization, errors etc.

Metadata Complexity Differences in data models, data types, schemas, semantics, standards and

language.

Systematic Governance is Key to reducing cost

Most efforts to reign in cost of IT environments is manifested through targeted cost

reduction projects, either by headcount reduction or decommissioning of aging

systems. While this targeted approach has the desired short-term impact, it rarely fixes

the long-term problem, associating increasing complexity with increasing costs.

Just as systematic governance of financial data has created transparency and

accountability, while preventing inefficiency and waste, we believe that systematic

governance of core business data can yield a plethora of benefits.

In this paper, we are suggesting that a systematic governance program can reduce the

long-term costs of managing complex data environments. Our focus will be Master

Data Governance, i.e. The systematic and automated management of critical business

data assets ("Master Data")


4

Master Data Governance Approach

What is Master

Data?

Master Data describes the core data assets on which the business of an organization

runs.

For commercial organizations, master data includes data on customers, products,

employees, suppliers, locations, legal entities and about ~ 100 other critical data

objects.

For non-commercial institutions, master data includes data about people,

organizations, partners, vendors, regulators etc.

Our approach Global IDs uses a number of specialized applications to automate the analysis of the

master data environment inside large organizations.

We use a systematic and automated approach to

a) Create transparency within the data environment

b) Create quality assurance monitors on the data environment

c) Establish a master data governance portal for data stewards

Stage 1: Transparency Processing

Global IDs software first uses a 4 step process to perform core analysis on the data

That is found inside an organization’s databases.

Step 1: Data Discovery

The software scans repositories of data in the systems environment.

Step 2: Data Profiling

The software analyzes the data in both databases and content repositories.

Step 3: Data Classification

The software determines all the patterns that exist in the environment, and

categorizes the data

Step 4: Data Mapping

The software connects all similar data objects together, creating enterprise

maps of business objects (e.g. Customer, Product, Employee, Vendor etc.)

Stage 2 : Next, the Global IDs software uses a 4 step process to perform core quality analysis


5

Quality

Processing

on the data that is found inside each database that the software has encountered.

Step 5: Data Verification

The software applies quality rules against known categories of data domains.

Step 6: Data Validation

The software applies multi-column business rules to validate the data.

Step 7: Data Stewardship

The software establishes a RACI matrix, to determine who is responsible and

accountable for the quality of master data.

Step 8: Data Monitoring

The software continuously monitors the data landscape to ensure that master

data conforms to data integrity standards across the whole enterprise.

Stage 3 :

Portal Generation

Global IDs software then generates a web portal for each master data object, so that

data stewards can interact with core business data.

Step 9: Web-Portal Generation

A portal is generated for each of the master data objects that have been

automatically mapped in Step 4

Step 10: Web-Portal Customization

Since the individual needs of each organization are different, the software

is customized to meet the needs of each individual data steward.

Stage 4:

Portal Operations

To make the Portal valuable to the Data Steward for the daily operational use, the

Master Data Governance Portal provides the following functionality

1. An Enterprise View of each Master Data Object

(Showing the distribution of master data across all databases within the organization) 2. Enterprise Search

(Providing the ability to search for specific master data within the organization) 3. Audits

(Showing the conformance of master data with data quality rules) 4. Business Rules

(Showing the business rules that pertain to the master data object) 5. Controls

(Showing active controls, violations, exception conditions etc.) 6. Analytical Reports

(Showing multiple reports with the results of different types of analysis on master data) 7. Metrics

(Showing business metrics on master data - growth, change, quality improvements, etc.)

8. Policies and Procedures (Showing the documented policies of the organization, and actual conformance)

9. Domain Analysis: Attributes, Code Tables, Subtypes (Analysis of each attribute, code set, and subtypes belonging to the master data object)

10. Hierarchies

(Showing business-relevant hierarchies built on the master data object)

The above list shows the default functionality that is available in the Global IDs

portal. The software permits customization and can be tailored to individual users.

Results Using the above approach, each business data object that is important to the

functioning of the organization, can be systematically governed, and business data

stewards can ensure that conformance to quality controls are being continuously

monitored.


6

Some examples of Master data objects (in multiple industries) that could be subject to

Systematic Master Data Governance are shown below

Educational Publishing Company 1. Customer 2. Product

3. Sales Representatives

4. Author

5. Printer

6. Bookstores

7. ...

Financial Services Company 1. Client

2. Account

3. Security

4. Counter-Party

5. Registered Representatives

6. ...

Healthcare Company 1. Member 2. Provider

3. Hospital

4. Policy

5. Claim

6. ...

Chemicals Manufacturing Company

1. Customer 2. Product

3. Distributer

4. Regulator

5. Supplier

6. Plant

7. …

Pharmaceutical Company 1. Customer 2. Product

3. Patient

4. Regulatory Agency

5. Pharmacy

6. Doctors

7. ...

The Global IDs software comes pre-loaded with multiple master data objects, and

hundreds of known master data domains. The software can be extended to other

master data objects that are relevant to a specific organization.


7

Master Data Governance (Example)

Scenario The details behind the approach can be described by applying the approach to a real-

world problem.

Most organizations make it a priority to organize their customer master data, since it is

directly related to their revenue stream. The organization's core applications maintain

accurate records of the customer data, by ensuring all the following types of data are

verified by customer support representatives.

Customer Master Data

1. Customer Identifier 2. Customer Name

3. Customer Shipping Address

4. Customer Billing Address

5. Customer Phone Number

6. Customer Email Address

7. Customer Website Address

8. Active Customer Flag

9. ...

Initially, when the organization is small, maintaining this type of customer data is

easy. As the organization evolves, entropy sets in.

The evolution of

Customer Data

As the organization develops, either through growth or acquisition, the number of

applications and databases that hold customer information increase, with the

“customer data landscape” becoming larger and more complex. Over time, the

sheer volume and complexity of the data forces the databases to fall out-of-sync,

and numerous inaccuracies start creeping into the customer master data.

In the case of large companies that have evolved over decades, the problems are even

more pronounced. Reasons include:

1. There are a large number of databases, and no-one is certain where all the

customer master data resides.

2. The databases are spread out geographically, with limited oversight or

control. Different country databases could be using different identifiers for

the same customer.


8

3. The customer data entry systems are diverse, and data is often input without

ensuring that the data is accurate or complete.

4. Due to data flow across different databases, across different interfaces, and

with intermediate transformations, it becomes increasingly difficult to trace

which downstream systems are consuming the customer master data.

5. Knowledge of customer master data is dispersed across different business

units, and no-one is accountable for the customer data, at the enterprise level.

Due to these realities, there are multiple versions of customer master data across the

organization, but no “single version of the truth”. With the underlying databas

es changing all the time, no-one can understand the diversity of customer master

data that the enterprise level. In the majority of organizations, no single person is

accountable for the customer data.

Once the process of maintaining centralized customer data becomes costly and time-

consuming, the organization distributes the responsibility which creates multiple silos

of customer data that do not have consistent checks and balances. Each silo starts

diverging, and inaccuracies starts creeping into the customer data environment.

The Business Impact of “Bad” Customer Data

The divergence in the quality and integrity of customer data initiates a series of events

that annoys the customer, and degrades the relationship. Common events that lead to

customer dissatisfaction include

Existing customers are sent invitations to become new customers

(the marketing databases have not synchronized with the active customer

database)

The same customer is solicited multiple times in a single mailing campaign.

(the marketing database have not been cleansed of duplicates)

The number of errors in shipping and billing increases, due to inaccuracies in address information

(Data quality errors in customer information systems)

Even after a customer corrects inaccurate data, the errors persist.

(multiple customer data silos with redundant information)

When customers start noticing problems with their information, business problems

start to manifest themselves. Each encounter with the customer that leads to a negative

perception, degrades the customer relationship and eventually leads to the loss of the

customer to a competitor.

How to improve customer data quality

A systematic Customer Master Data Governance Program can prevent degradation of

customer data quality by putting controls around enterprise customer data.

Given the enormity of tasks related to maintaining enterprise customer data, manual

approaches to enterprise data quality are often infeasible. It is important to

emphasize, that the controls must be automated through software that continuously monitors the data.

A software-centric methodology to Customer Master Data Governance can

systematically place quality controls on the data environment. The step-by-step

methodology that is used by Global IDs is described below.


9

Step 1:

Customer Data

Discovery

In the first step, Data Discovery software systematically "scans" the environment, to create an inventory of all data and information assets.

Since most customer data resides in structured databases (i.e. Relational databases) a

structured data scan can be performed on enterprise databases

Structured Database Scan The software is used to connect to relational databases and extract the metadata from the database.

Scanning is supported for the following most relational database platforms,

such as

Oracle

DB2 on Unix

DB2 on Mainframe

MS SQL Server

MySQL

Sybase

Teradata

Apache Derby

(other relational databases with Type 4 JDBC drivers can also be supported)

For relational databases the scanning processes can be parallelized, and

scans can be completed in hours or days instead of weeks or months.

It is important to conduct the scans, such that

The load on the database is minimum, and there is no performance

impact on the database.

Data security policies are adhered to. Scanning is only performed

for those tables where authorization has been granted.

There is no change to the database after the scan is completed. This

is achieved by READ-ONLY ACCESS permission.

Figure 1 : Shows Database Instances and Schema in Data Landscape


10

Step 2 :

Customer Data

Profiling

In the second step, software can be used to systematically "profile" and analyze the

data that is present in the enterprise data landscape to understand the patterns that are

present in the data.

Profiling is performed for all accessible

Schemas

Tables

Columns

Clusters

This activity results in a detailed analysis of the content of each database and provides

a deep understanding of the data environment. It allows the user to identify all the

customer data tables in the databases that have been scanned.

Multiple types of profiling activities should be performed.

Profiling Type Value of Analysis

Pattern Mining Find pattern deviations

Domain Profiling Recognize domains. Understand quality rules. Identify deviations.

Relationship Profiling Understand undocumented relationships. Identify orphans.

ID Profiling Recognize identifier columns and composite keys

Statistical Profiling Find statistical outliers

Hierarchy Profiling Find hierarchical relationship across data columns

Distribution Profiling Find distributions of values

Sub-Type Profiling Recognize subtypes and subtype descriptions. Find unusual subtypes

Sub-Table Profiling Partition table by subtype. Profile each sub-table. Compare profiles.

Quality Profiling Compute quality metrics

Numeric Profiling Compute standard deviations. Find outliers.

Time Series Profiling Compute time series metrics. Identify outliers.

Duplicate Profiling Understand Record duplication ( multiple matching algorithms )

Dependency Profiling Understand dependencies between columns in a table

Cluster Profiling Profile subsets of columns (clusters)

Dynamic Profiling Profile current data. Compare results with previous snapshots

Records Profiling Analyze which records satisfy quality audits (pass/fail)

File Profiling Profile external data files.

Code Table Profiling Recognize codes and code descriptions. Find unusual codes.

Some of the key types of data profiling are described below

1. Pattern Mining

The software is used to analyze each value in each column of data and find the patterns associated with the column.


11

For example, the pattern mining of a Customer Phone Number column may

give the following patterns

(nnn)nnn-nnnn

+1(nnn)nnn-nnnn

1nnnnnnnnnn

nnn.nnn.nnnn

….

Erroneous data can be identified by the presence of erroneous patterns

2. Domain Profiling

The software is used to analyze all the values in each column of data, to determine whether it belongs to a known data domain.

Domain profiling results in the recognition of many of the columns in the

database tables. The types of domains that can be recognized include:

Global Domains

(e.g. Names, Addresses, Phone Numbers, Emails, URLs, Dates,

Countries, Zip Codes etc)

Business Domains

(Identifiers associated with Customers, Products, Employees etc. )

Industry Domains

(domains that are specific to a particular industry,

Healthcare:Patient_IDs, Healthcare:Provider_IDs etc)

Global IDs comes with hundreds of domains predefined inside the software.

3. Relationship Profiling

The software is used to understand the relationships of the tables to each other.

The profiling results can be used to understand

Explicit Relationships ( primary key – foreign key constraints)

Implicit Relationships ( undefined relationships across key fields)

4. ID Profiling

The software is used to analyze the columns within each table and detect the identifiers that are present.

Many types of Identifiers can be detected.

Primary Keys

Natural Keys (non-unique)

Composite Key Combinations

Surrogate (Keys)

5. Statistical Profiling

The software is used to analyze the columns and compute statistical metrics

related to the data.

For descriptive columns, a variety of completeness metrics are computed.

Number of Nulls

Number of Blanks

Number of default values ( NA, TBD, N/A etc)

Number of duplicates


12

For numeric columns, a variety of statistics are computed.

Minimum, Maximum, Average

Standard Deviation

Most frequent occurrence

6. Hierarchy Profiling

The software is used to analyze the data and determine whether there are hierarchical relationships present in the data.

A variety of hierarchy types can be detected.

Subtype-based hierarchies

Recursion-based hierarchies

Rule-based hierarchies

7. Distribution Profiling

The software is used to analyze the frequencies of values within columns to come up with a distribution.

Different types of distributions can be detected.

Length Distributions (within columns that contain text)

Value Distributions (within columns that contain duplicate values)

Numeric Clusters (within columns that contain numeric data)

The distributions can be used to detect outliers in the data sets, locating potential

data quality errors.

8. Quality Profiling

The software is used to determine quality related metrics.

Completeness

Duplication

Consistency

Conformity

Integrity

Trust

9. Dependency Profiling

In this type of profiling, the dependence between each column is determined by the data. This analysis is useful to find common dependencies across

columns, and then find deviations from identified dependencies.

10. Subtype Profiling

The software is used to determine which subtypes are present in the data.

Subtypes include fields like

CUSTOMER_TYPE

PRODUCT_TYPE

PRODUCT_SUBTYPE

11. Numeric Profiling

The software is used to analyze the values of numeric columns, and determine the following metrics.


13

Whether there are any numeric outliers in the data. Outliers are computed

on the basis of standard deviations from the mean.

Whether the numeric data is associated with any units.

Whether there are dependencies across numeric columns.

12. Time Series Profiling

The software analyzes time-series data to determine whether the data conforms to the norm.

A variety of time-series metrics are computed.

Unusual changes in the time-series data (value domain)

Unusual changes in the time-series data (frequency domain)

Figure 2 : Results from Statistical Profiling

Step 3 : Customer Data Classification

In the third step, the Global IDs software tries to systematically “recognize” a

ll the fields that contain customer identifiers. By examining all the patterns that have

been mined in the previous step, the software is able to determine which customer

IDs are found in which tables in which databases.

Effectively, what this classification step does is to “organize” all the customer

data that is found across a complex data landscape, and establish the distribution of

customer data across the enterprise.

For example, the software is likely to recognize

Different types of customer IDs (keys) that are used across the

enterprise

Whether customers are organizations or individuals.

Natural key fields like Customer Name, Customer Address, Customer

Phone numbers, Customer URLs, Customer Email Addresses


14

Through the patterns that have been mined previously, the software is able to

recognize the rules related the format and pattern of the customer IDs.

As a result, the software is able to isolate customer identifiers that do not follow the

rules, and detect these outliers.

Even in the presence of "dirty" data or "ambiguous" data, the software is robust

enough to identify Customer IDs. Consequently, it works under situations where there

is

• No enterprise naming standard

CID, CUST_ID, CNUM

• No standardization of abbreviations

CUST, CUSTOMER, CUSTOM

• Non-standard names that have been used by different DBAs

CUSTOMER_IDENTITY, CID, CUSTOMER_IDENTIFIER

• Typos in column headers

CSTOMER_IDENTTY_NUMBR,

• Custom Rules can be included by users to improve the recognition logic.

Figure 3 : Results of Classification Step showing distribution of domains.

Step 4: Customer Data Mapping and Lineage

Now the software is in a position to create a map of all customer data across the

enterprise. This map adds attribute information, to the customer identifiers that have

been previously mapped.

Global IDs uses a technique called “semantic equivalence mapping” to establish the

Customer Data Maps. The software uses the output of the profiling analytics to

understand the semantic content of each column of data, before attempting to map

equivalent semantic domains.


15

The software is able to create a comprehensive picture of the customer entity, and create a fairly comprehensive map of customer data. Using this map, the organization

can trace the presence of customer data across the data landscape.

It should be noted that these types of customer data maps are extremely costly and

time-consuming to create, especially for complex data landscapes. By automating the

generation of the Customer Data Map, the software reduces the level of manual effort

that is required for understanding customer data in a comprehensive way.

Figure 4 : Sample Customer Map showing database tables mapped to canonical form.

Step 5: Customer Data Verification

In the next step, the software tries to systematically verify each customer data domain

using the rules that it has detected in the profiling step. The types of rules that have

been harvested includes.

Format Rules

Length Rules

Pattern Rules

Data Type Rules

Data Encoding Rules

Reference Value Rules

Substring Rules

Special character Rules

Language Rules

By collecting additional rules from the user community, the software is able to create

a comprehensive rule-base to establish the constraints that each domain is allowed to

have.

For example, the Customer ID domain may have the following rules associated with

it.

Format Rule = nnn-nn-nnnn

Length Rules = fixed length 11


16

Pattern Rules = 3[1-9]-2[0-9]-3[0-9]

Data Type Rules = varchar

Data Encoding Rules = ASCII

Reference Value Rules = none

Substring Rules = none

Special character Rules = “-” allowed

With these rules in place, all Customer IDs in the data landscape can be examined for

compliance with this rule-base.

Similarly, separate rule-bases can be created for each of the Customer Data Domains.

Customer Identifier

Customer Name

Customer Shipping Address

Customer Billing Address

Customer Phone Number

Customer Email Address

Customer Website Address

Active Customer Flag

These types of “single-column” rules can be automatically established using software.

Figure 5 : Results of data validation showing errors identified by software

Step 6: Customer Data Validation

Given that the software does not understand the business context, it becomes

important to capture the business domain knowledge that is possessed by the user

community.

In data validation step, the user community adds known business rules to the customer

rule base.


17

For example, the Customer Type domain and the Customer Subtype domain may have a hierarchical parent-child relationship associated with it.

If CustomerType = Active

then CustomerSubtype is not null

These types of “multi-column” relationships are difficult to detect automatically

Hence, the participation of the user community is required to establish these business

rules.

Collectively, the “single-column” rule-bases, and the “multi-column” rule-bases

form the Customer Rule Base, and establish the constraints under which all customer

master data must comply.

The advantage of this approach is that it provides a comprehensive set of rules for

customer data. An audit of customer master data is now possible, and all violations of

these rules across the data landscape can be systematically rooted out.

Figure 6 : User Interface for establishing controls based on Business Rules.

Step 7: Customer Data Stewardship

In the Data Stewardship layer, the software tries to establish accountability for data

quality, by associating one or more people with responsibility for Customer Master

Data.

The Global IDs software allows assignment of customer data domains with a RACI

matrix, showing who is responsible, accountable, consulted and informed. These data

stewards become responsible for compliance with PII policies.

The Data Stewards in the Master Data Governance team are responsible for

Creating controls

Generating compliance reports and compliance metrics*

Running periodic audits

With an accurate and comprehensive understanding of the distribution of Customer

Master data in the enterprise, the governance team can take steps to ensure that

Customer Master Data can be trusted to have accurate information


18

Customer Master Data has gone through systematic checks and balances

Customer Master Data quality is being measured periodically

Customer Master Data can be integrated in a Master Data Hub.

Figure 7 : RACI Matrix for Database Schemas

Step 8: Customer Data Monitoring

Since the data in enterprise databases is continuously changing, the quality of this data

can degrade over time. The software prevents this degradation by continuously

monitoring the Customer Master Data in the data landscape.

By periodically auditing the data environment, and ensuring that compliance status of

each data source is improving over time, an effective governance process can be put in

place.

The software can be used to periodically re-run the scans and generate compliance

metrics for Customer Master data in each database schema

As additional new environments are added to the enterprise data landscape, those

environments can be included in the compliance monitoring process.


19

Figure 8 : Monitor of Record Count Changes in Customer Tables

Step 9: Customer Data Governance Portal Generation

To make the governance of Customer Master Data intuitive and user-friendly, the

Global IDs software generates a Customer Master Data Governance portal. The web-

based portal allows data stewards and business users to carry out their operational

activities easily.

The following activities can be supported through the Customer Data Governance

portal.

1. An Enterprise View of Customer Master Data

(showing the distribution of customer master data across all databases within the organization)

2. Enterprise Search (providing the ability search for customer master data within the organization)

3. Audits (showing the conformance of customer master data with data quality rules)

4. Business Rules (showing the business rules that pertain to the customer master data object)

5. Controls (showing active controls, violations, exception conditions etc.)

6. Analytical Reports (showing multiple reports with the results of different types of analysis on customer master data )

7. Metrics (showing business metrics on customer master data - growth, change, quality improvements, etc.)

8. Policies and Procedures (showing the documented policies of the organization, and actual conformance)

9. Domain Analysis: Attributes, Code Tables, Subtypes (analysis of each attribute, code set, and subtypes belonging to the customer master data object)

10. Hierarchies

(Showing business-relevant hierarchies built on the customer master data object )


20

Figure 9 : Portal Designer Module showing configuration of Data Governance Portal

Step 10: Customer Data Governance Portal Customization

To make the governance of Customer Master Data intuitive and user-friendly, the

Global IDs software generates a Customer Master Data Governance portal. The web-

based portal allows data stewards and business users to carry out their operational

activities easily.

The following customization activities can be supported :

1. Portlet Creation 2. Graph Generation 3. Analytical Reports 4. Compliance Metrics

Figure 10 : Sample of auto-generated Customer Data Governance Portal, awaiting

customization.


21

Use Cases

Why govern

Customer Data?

In most organizations, Customer Data is the key to all revenue generating activity.

A systematic governance process for customer-related data assets may be able to

improve the quality and reliability of customer data across the enterprise data

landscape.

Some possible use cases are described below.

Use Case 1: Creating a 360- degree view of customer relationships

Many organizations aspire to create a 360-degree view of their customer relationships.

In other words, there is a desire to gain an understanding of all interactions with the

customer, thereby generating a “holistic” view. Gaining this view allows the

organization, to understand how to improve each customer relationship, thereby

increasing both revenue and profit.

Given that customer data is distributed across a large number of applications and

databases, and given the heterogeneous nature of the enterprise database environment,

it is likely that there are many discrepancies in the customer data across the databases.

The pattern mining / domain classification approach can be used to map customer data

across the enterprise data landscape, and create systematic governance processes on

this data, thereby generating this holistic view.

The equivalent steps for this effort would be

1. Discovery of Enterprise Databases

2. Profiling of Structured and Unstructured Data for Customer IDs.

3. Domain Classification of Customer IDs

4. Object Mapping of all Customer identifiers

5. Identification of Customer Data Stewards : RACI matrix creation

6. Customer Data Quality Monitoring and Governance

Use Case 2: Customer Master Data Integration

Organizations need a trusted and reliable source of customer data.

In reality, most organizations have multiple sources of customer data in different

databases. These databases are usually out of sync, and each database contains


22

information that is partially correct. As a result, it is difficult to maintain a “master” file that contains the “system of record”.

Many factors make it difficult for most organizations to maintain accurate and timely

customer information.

continuous mobility (address changes, contact phone number changes)

continuous variation in identification (name changes, email changes)

occasional variation in status ( changes in credit worthiness)

variation in personal circumstances (financial status, household status)

Unless the organization is continuously enriching its databases with these types of

information, discrepancies between the real-world data and the stale data in the

database will be inevitable, and discrepancies across databases will be inevitable.

The master data governance approach, coupled with a data enrichment strategy can be

used to create a Customer Master Data Hub and systematic governance of high quality

customer data.

Use Case 3: Customer Data Quality Measurement and Audits

Given the importance of the customer data it is critical that all such data is

systematically audited, on a periodic basis.

To ensure that the quality and accuracy of the data is continuously improving, the

audits should be conducted fairly frequently, and the quality metrics should be

monitored periodically by the data stewards. A continuous-running software process

that runs in the background, and systematically audits the quality of customer data is

an effective, low-cost way of achieving these objectives.

Use Case 4: Customer Data Enrichment for improved marketing

Marketing data is often customer-centric, and needs to be enriched with demographic

information to improve segmentation strategies.

The ability to automatically enrich customer information with external data can

empower an organization to be better aware of the customer's circumstances, and

target marketing efforts with greater focus.

The approach described here, along with automated customer data enrichment

strategies can be used to improve the effectiveness of marketing campaigns.

Use Case 5: Mergers and Acquisitions

Most large organizations go through periodic merger and acquisition activity. As a

result, the customer data landscape is continuously changing and new sets of customer

data must be integrated into existing master data repositories.

The methodology in this white paper, allows organizations to adapt their customer

data governance framework to help during M&A activity. For example, it can greatly

reduce the cost of customer data integration activity between the merged companies,

while increasing the speed of CDI (“Customer Data Integration”) projects.


23

Global IDs’ Technology

Global IDs

Platform

Global IDs’ provides a unique software platform to support the needs of

large organizations that have complex IT environments. The software provides

a broad functional framework to address different types of data management

projects.

Understand and integrate data at an enterprise level

Provide information-on-demand from across the enterprise

Automate the cleansing and validation of corporate data.

Address the complexity of global data ( different languages / standards )

Automate the stewardship and governance of corporate data.

Some of the benefits associated with the Global IDs platform are described below.

Enterprise Scalability

Since the software performs complex workflows with a high degree of automation, we

are agnostic about the size or complexity of the enterprise IT environment. As a result,

we have the

Scalability

Our software uses hands-off computation processes to handle complex

workflows that can span multiple data sources. From the software perspective,

there is no difference between a single data source or 100 data sources.

Robust

Performance

The software platform uses

Massively Parallel Processing (MPP)

Performance can scale with hardware availability (CPU / memory resources).

Fault Tolerance

The software has been designed to exist in a distributed computing

environment, and has persistence built into the platform.

Security Global IDs software prioritizes compliance with enterprise security standards.


24

Company Overview

Global IDs was founded in 2001.

The company was created specifically to address the problem of large scale information

integration. Our mission is to develop agent-based integration solutions that can

accomplish integration across hundreds of systems.

The company created the world’s first data integration software that employs

intelligent mobile agents for large-scale integration tasks. Our solutions make it

possible for large global companies to address the integration tasks that are required to

improve business performance analysis.

Our client portfolio includes major Fortune 2000 companies.

25

Contact Information

For further details on Global IDs products and services, please contact

us via email or telephone at:

[email protected]

Global IDs 184 Nassau Street Princeton, NJ 08542, USA

(646) 201- 9498

www.globalids.com


mailto:[email protected]

http://www.globalids.com/

Download - Master Data Governance - · PDF fileMaster Data Governance A systematic approach to managing enterprise data assets White Paper ... Establish a master data governance portal for data

Top Related