5 steps to more valuable enterprise data
TRANSCRIPT
-
8/14/2019 5 Steps to More Valuable Enterprise Data
1/10
Leader in Data Quality
and Data Integration
www.datafux.com
877846FLUX
International
+44 (0) 1753 272 02
A DataFlux White Paper
Prepared by: DataFlux Corporation
Five Steps to More Valuable Enterprise Data
-
8/14/2019 5 Steps to More Valuable Enterprise Data
2/10
1
Introduction
Every good business decision is based upon good data. Whether making operational,
analytical or strategic decisions, every department in an organization from the boardroom
to the sales floor requires reliable, accurate information from knowing which customers
own what products, to knowing which customers are potential opportunities for up-sell to make those decisions correctly. But it's also true that bad data inevitably leads to bad
business decisions, improper strategies and poor customer service.
As companies undertake data-dependent initiatives, such as enterprise resource planning
(ERP), customer relationship management (CRM) or enterprise initiatives leading to
complete data governance, it is important to remember that data quality issues exist in
any organization. Successfully addressing these data quality issues and establishing the
controls to maintain the quality of that data is critical to the success of any of these
initiatives.
There are many reasons for the explosion of inconsistent or unusable data, from simple
human error to the lack of data standards across systems, business units or divisions. To
overcome the problems of bad data, organizations need to create a technological
foundation for finding and eliminating data quality problems.
But data quality issues cannot be resolved by technology alone. Having the correct people
and processes in place is also crucial. This paper will show how a five-phase process can
successfully organize people, processes and technology in a proven data quality
methodology that allows organizations to analyze, improve and control corporate data.
Data quality anddata governancerequire acombination ofpeople, processesand technology.
-
8/14/2019 5 Steps to More Valuable Enterprise Data
3/10
2
The Five Steps to More Valuable Enterprise Data
Technology is only part of the solution to produce consolidated, high-quality corporate
information. While it's essential to have robust technology supporting any data quality
initiative, having an appropriate methodology for using that technology will be even more
essential to the success of the initiative.
DataFlux has developed a five-component methodology for enterprise data quality and
data governance based on over a decade of experience and thousands of
implementations. The five components are data profiling, data quality, data integration,
data enrichment and data monitoring. Together, these five steps provide a proven,
practical approach to data governance that allows organizations to analyze, improve and
control their data.
Figure 1 - The five components of a data management program.
These components provide a framework for the successful management of the entire data
improvement process. When followed, this methodology results in a unified view of any
type of data, including customer, product and supplier information anywhere in the
enterprise.
Data Profiling
The first step in the process, data profiling is the examination of the structure,
relationships and content of existing data sources to help create an accurate picture of
the state of the data. By determining the current state of the data, data profiling helps in
planning the best ways to correct or reconcile information assets.
Data profiling, dataquality, dataintegration, dataenrichment anddata monitoringare the fivecomponents of adata managementprogram .
-
8/14/2019 5 Steps to More Valuable Enterprise Data
4/10
3
There are three separate components of a data profiling exercise, each of which analyzes
data in a unique way, that together create a clear picture of the nature and scope of data
quality issues.
Data profiling is essentially the process of addressing key questions about the data for
each of these components:
Structure discovery Do the data patterns match expected patterns? Does thedata match the corresponding metadata?
Data discovery Are the data values complete, accurate and unambiguous? Isthe data standardized according to established conventions?
Relationship discovery Does the data adhere to specified required keyrelationships across columns and tables? Are there inferred relationships across
columns, tables or databases? Is there redundant data?Figure 2 shows the type of report generated during data profiling efforts. This analysis is
an example of data discovery, as it shows the number of null counts for different fields
within the database. Note that the overall count of 5000 records is consistent, but some
fields show a high number of null values. With an understanding that there are significant
percentages of missing values in several key fields - ADDRESS, CITY, STATE, and
particularly GENDER business analysts can be immediately aware of the scope and
nature of data quality issues.
Figure 2 This chart displays a high number of null values in the Sales table.
Data pr ofilingprovides anaccurate picture ofthe current stateof corporate data.
-
8/14/2019 5 Steps to More Valuable Enterprise Data
5/10
4
Profiling provides quantifiable information detailing the strengths and weaknesses of
corporate data. This knowledge can be used as the basis for subsequent data
improvement. The information gained from data profiling feeds the next phase - data
quality.
Data Quality
The data quality phase applies the knowledge gained during data profiling to begin the
process of building better data. This phase helps correct errors, standardize information,
and validate data throughout the enterprise.
There are a number of different tactics employed during this phase. Some of the most
commonly employed measures include:
Data standardization to address and correct multiple permutations of data. Forexample, ACME Manufacturing Corporation may be represented in the same data
source as Acme Mftg Corp, ACME and ACME Manufacturing. Intelligent fuzzy
matching allows these variants to be standardized to create a singlenomenclature.
Pattern standardization to create valid patterns of data across tables andcolumns. Some pieces of data, such as phone numbers or Social Security
Numbers, are found in easily recognized variant patterns. Others, such as product
or item data, may have different standards across industries or companies.
Pattern standardization can take information in non-standard formats and
transform it into an accepted standard format.
Address verification to confirm that addresses are valid and actionable. Forexample, DataFlux software can determine if the city and state do not match the
ZIP code. The software searches postal data, locates the proper ZIP code and
changes it to create an accurate address.Figure 3 shows how technology can help create a step-by-step business rules to address
data quality problems. In this example, the workflow starts with the source table, then
progresses through standardization, address verification and other routines. The earlier
data profiling routine uncovered these issues, and the user creates steps to address each
identified problem.
The data qualityphase uses theknowledge gainedby profiling toaddress dataissues and beginbuilding betterdata.
-
8/14/2019 5 Steps to More Valuable Enterprise Data
6/10
5
Figure 3 A sample data quality workflow.
The specific approach taken may differ for each data element, and the decision on the
approach falls to the business area responsible for the data.
It is also important to note that the data quality activity improves the existing data but
does not address the root cause of the data problems. If the enterprise is truly interested
in improving data quality, it must also investigate the reasons that the data containederrors and initiate appropriate actions, including incentives and changes to business
procedures to improve future data.
With data quality complete, organizations have enacted the measures to bring the
completeness and accuracy of the data in each source to acceptable levels. The next step
is integration from multiple sources.
Data Integration
For any type of data, different data elements about the same item will often exist in
multiple databases. Data integration is useful when organizations attempt to rationalize
data across different sources. Data integration provides the ability to take these
divergent pieces of information and unify them within a single master record.
For example, a company has two product files: a master product extracted from its USA-
based enterprise resource planning (ERP) package and a product database from Europe.
The company sells the same products in both areas, but the products are listed in each
database by different names. In addition, the product, brand and descriptions in each file
had been input by different data entry personnel.
Investigating andaddressing thecauses of poordata quality isessential.
Data integrationcombinesinformation frommultiple datasources into amaster record.
-
8/14/2019 5 Steps to More Valuable Enterprise Data
7/10
6
The first challenge in data integration is to recognize that the same product exists in each
of the two sources - the process of linking and the second challenge is to combine the
data into a single view of the product the process of consolidation.
With customer data, there is often a common field, such as a tax identification number,
that can help identify commonality. When this occurs, companies can identify multiple
records for the same customer quickly and easily.
With product data, this is often not the case. In the earlier example, data integration
technology enabled the company to identify the product information for linking across the
two different systems and consolidate the data.
Intelligent fuzzy matching identified a product description that was common to both files.
The USA file contained the description, brand name and product identifier all in one field
and in various patterns. The European file contained only product descriptions, also with
varied patterns and abbreviations. To uncover the connection, data integration
technology was used to:
Parse the description from the USA file into product specific attributes andinto a brand name
Reconcile the differences in brand names Reconcile the differences in product attributes (short forms, abbreviations,
etc.)
Phonetically match the reconciled data Display reports of matching products
After data integration, the company now possessed a comprehensive view of all the data
know about a subject. The same principles and techniques may be applied to any type of
data to create integrated master records.
The next step is to increase the datas value by enhancing it with additional information.
Data Enrichment
Data enrichment entails incorporating additional external data to add value to existing
records. There are multiple ways this may occur. Data enrichment may mean enhancing
corporate data with third-party data to increase an understanding of the customer and
their buying potential and loyalty. Or, advanced geocoding to add enriched geographic
information such as being able to know that all the houses in a particular ZIP-code-plus-
4 area were built after 1980 and exceed a certain property value provides information
to target certain product offerings.
Intelligent fuzzymatchingtechnology cangreatly simplifydata integrationprojects.
Enrichingcorporate data bycombining it withdata from trustedthird-party sourcescan add value .
-
8/14/2019 5 Steps to More Valuable Enterprise Data
8/10
7
An understanding of the behavior of customers can also be attained with analysis of
certain attributes. By combining that data with specific customer data, one could segment
customers more effectively to identify specific opportunities.
Figure 4 illustrates how data improvement technology can add crucial components to a
regular street address by matching the submitted record to external data sources.
Submitted Data Standardized DataDataFlux DataFlux Corporation
940 Cary Pkwy 940 Cary Pkwy, Ste. 201Cary, NC Cary, NC 27513-2792
New Data Points AddedCarrier Route ID 34
Delivery Point ID 1
County Number 183
County Name Wake
Congressional District 4
Latitude 35.8306
Longitude -78.7841
Census Block Group 371830535122
Figure 4 - Data enrichment on a street address.
Data enrichment can be used for more than just address data. Many types of data,
including product, inventory and financial data, can have value added by connecting the
data to external sources.
This can mean appending industry-standard product codes from registries like UNSPSC or
eCl@ss to standardize product or inventory records and streamline buying and selling
goods. Integrating your database with these universal codes can guarantee the
correctness of inventory, increase efficiency in ordering and help pave the way for smooth
international operations.
Data enrichment also helps manage corporate compliance issues, by comparing customerlists or transactions to external lists in accordance with government or industry
regulations. This can be used to conduct anti-money laundering checks, enact fraud
detection or integrate do-not-call registries.
Data enrichmentcan be used to addcommodity codingstandards, such aseCl@ss orUNSPSC.
Data enrichmentaids in complianceinitiatives byintegrating watchlist and do-not-callinformation.
-
8/14/2019 5 Steps to More Valuable Enterprise Data
9/10
8
Data Monitoring
Data monitoring is essential to a complete data governance program, ensuring that hard-
won data improvement end enrichment efforts aren't degraded by the creeping return of
data errors. Active data monitoring can help companies understand the condition of their
data, and to isolate and correct the causes of data quality issues.
Data monitoring can take a number of forms. The simplest version of data monitoring
automates a profiling report - a periodic analysis of the data that searches for exceptions
and non-standard data - trigger events, such as too great a percentage of exceptional
data, would generate an email or a system alert to the individuals or department
responsible for the data.
Another, more dynamic - and more effective - method of data monitoring is to actively
enforce business rules as web services. Using this method, the same rules which were
used to cleanse and enhance the data can be applied to data in real time as it enters and
moves through the enterprise. The advantages of this approach include the ability to:
Enforce data governance rules Ensure business-specific rules for howorganizational data should be managed and handled are followed.
Understand and refine mission-critical processes Violations or exceptionsto business rules are logged in a repository, to track and address trends.
Invoke events to correct the data The monitoring engine can invoke eventsto automatically correct data, email business users, log the data to the
repository, or simply write the data to a database table.
Data monitoringallows businessrules developedthrough theprocess to bedeployed asongoingsafeguards againstdeclining dataquality.
-
8/14/2019 5 Steps to More Valuable Enterprise Data
10/10
9
Summary
The quality of any business analysis is only as good as the data at its foundation. Without
data that is consistent, accurate and reliable across the enterprise, an organization can
easily reach misleading, faulty and potentially harmful conclusions.
An organizations success in data analysis begins with finding a methodology and solution
that encompasses each of these building blocks data profiling, data quality, data
integration, data enrichment and data monitoring.
Todays powerful new data quality and data integration technology allows organizations to
support data-driven initiatives on an unprecedented level, providing a unified view of
customer, product, supplier and other data assets on a wide variety of applications across
the enterprise.
Through the combination of an intelligent methodology and appropriate technology, an
organization can elevate the role of data quality to an ongoing corporate activity, enabling
better business decisions and competitive advantage, all based on high-quality
information.
Analyze, improveand controlcorporate data.