5 steps to more valuable enterprise data

Upload: whojam9717

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    1/10

    Leader in Data Quality

    and Data Integration

    www.datafux.com

    877846FLUX

    International

    +44 (0) 1753 272 02

    A DataFlux White Paper

    Prepared by: DataFlux Corporation

    Five Steps to More Valuable Enterprise Data

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    2/10

    1

    Introduction

    Every good business decision is based upon good data. Whether making operational,

    analytical or strategic decisions, every department in an organization from the boardroom

    to the sales floor requires reliable, accurate information from knowing which customers

    own what products, to knowing which customers are potential opportunities for up-sell to make those decisions correctly. But it's also true that bad data inevitably leads to bad

    business decisions, improper strategies and poor customer service.

    As companies undertake data-dependent initiatives, such as enterprise resource planning

    (ERP), customer relationship management (CRM) or enterprise initiatives leading to

    complete data governance, it is important to remember that data quality issues exist in

    any organization. Successfully addressing these data quality issues and establishing the

    controls to maintain the quality of that data is critical to the success of any of these

    initiatives.

    There are many reasons for the explosion of inconsistent or unusable data, from simple

    human error to the lack of data standards across systems, business units or divisions. To

    overcome the problems of bad data, organizations need to create a technological

    foundation for finding and eliminating data quality problems.

    But data quality issues cannot be resolved by technology alone. Having the correct people

    and processes in place is also crucial. This paper will show how a five-phase process can

    successfully organize people, processes and technology in a proven data quality

    methodology that allows organizations to analyze, improve and control corporate data.

    Data quality anddata governancerequire acombination ofpeople, processesand technology.

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    3/10

    2

    The Five Steps to More Valuable Enterprise Data

    Technology is only part of the solution to produce consolidated, high-quality corporate

    information. While it's essential to have robust technology supporting any data quality

    initiative, having an appropriate methodology for using that technology will be even more

    essential to the success of the initiative.

    DataFlux has developed a five-component methodology for enterprise data quality and

    data governance based on over a decade of experience and thousands of

    implementations. The five components are data profiling, data quality, data integration,

    data enrichment and data monitoring. Together, these five steps provide a proven,

    practical approach to data governance that allows organizations to analyze, improve and

    control their data.

    Figure 1 - The five components of a data management program.

    These components provide a framework for the successful management of the entire data

    improvement process. When followed, this methodology results in a unified view of any

    type of data, including customer, product and supplier information anywhere in the

    enterprise.

    Data Profiling

    The first step in the process, data profiling is the examination of the structure,

    relationships and content of existing data sources to help create an accurate picture of

    the state of the data. By determining the current state of the data, data profiling helps in

    planning the best ways to correct or reconcile information assets.

    Data profiling, dataquality, dataintegration, dataenrichment anddata monitoringare the fivecomponents of adata managementprogram .

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    4/10

    3

    There are three separate components of a data profiling exercise, each of which analyzes

    data in a unique way, that together create a clear picture of the nature and scope of data

    quality issues.

    Data profiling is essentially the process of addressing key questions about the data for

    each of these components:

    Structure discovery Do the data patterns match expected patterns? Does thedata match the corresponding metadata?

    Data discovery Are the data values complete, accurate and unambiguous? Isthe data standardized according to established conventions?

    Relationship discovery Does the data adhere to specified required keyrelationships across columns and tables? Are there inferred relationships across

    columns, tables or databases? Is there redundant data?Figure 2 shows the type of report generated during data profiling efforts. This analysis is

    an example of data discovery, as it shows the number of null counts for different fields

    within the database. Note that the overall count of 5000 records is consistent, but some

    fields show a high number of null values. With an understanding that there are significant

    percentages of missing values in several key fields - ADDRESS, CITY, STATE, and

    particularly GENDER business analysts can be immediately aware of the scope and

    nature of data quality issues.

    Figure 2 This chart displays a high number of null values in the Sales table.

    Data pr ofilingprovides anaccurate picture ofthe current stateof corporate data.

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    5/10

    4

    Profiling provides quantifiable information detailing the strengths and weaknesses of

    corporate data. This knowledge can be used as the basis for subsequent data

    improvement. The information gained from data profiling feeds the next phase - data

    quality.

    Data Quality

    The data quality phase applies the knowledge gained during data profiling to begin the

    process of building better data. This phase helps correct errors, standardize information,

    and validate data throughout the enterprise.

    There are a number of different tactics employed during this phase. Some of the most

    commonly employed measures include:

    Data standardization to address and correct multiple permutations of data. Forexample, ACME Manufacturing Corporation may be represented in the same data

    source as Acme Mftg Corp, ACME and ACME Manufacturing. Intelligent fuzzy

    matching allows these variants to be standardized to create a singlenomenclature.

    Pattern standardization to create valid patterns of data across tables andcolumns. Some pieces of data, such as phone numbers or Social Security

    Numbers, are found in easily recognized variant patterns. Others, such as product

    or item data, may have different standards across industries or companies.

    Pattern standardization can take information in non-standard formats and

    transform it into an accepted standard format.

    Address verification to confirm that addresses are valid and actionable. Forexample, DataFlux software can determine if the city and state do not match the

    ZIP code. The software searches postal data, locates the proper ZIP code and

    changes it to create an accurate address.Figure 3 shows how technology can help create a step-by-step business rules to address

    data quality problems. In this example, the workflow starts with the source table, then

    progresses through standardization, address verification and other routines. The earlier

    data profiling routine uncovered these issues, and the user creates steps to address each

    identified problem.

    The data qualityphase uses theknowledge gainedby profiling toaddress dataissues and beginbuilding betterdata.

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    6/10

    5

    Figure 3 A sample data quality workflow.

    The specific approach taken may differ for each data element, and the decision on the

    approach falls to the business area responsible for the data.

    It is also important to note that the data quality activity improves the existing data but

    does not address the root cause of the data problems. If the enterprise is truly interested

    in improving data quality, it must also investigate the reasons that the data containederrors and initiate appropriate actions, including incentives and changes to business

    procedures to improve future data.

    With data quality complete, organizations have enacted the measures to bring the

    completeness and accuracy of the data in each source to acceptable levels. The next step

    is integration from multiple sources.

    Data Integration

    For any type of data, different data elements about the same item will often exist in

    multiple databases. Data integration is useful when organizations attempt to rationalize

    data across different sources. Data integration provides the ability to take these

    divergent pieces of information and unify them within a single master record.

    For example, a company has two product files: a master product extracted from its USA-

    based enterprise resource planning (ERP) package and a product database from Europe.

    The company sells the same products in both areas, but the products are listed in each

    database by different names. In addition, the product, brand and descriptions in each file

    had been input by different data entry personnel.

    Investigating andaddressing thecauses of poordata quality isessential.

    Data integrationcombinesinformation frommultiple datasources into amaster record.

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    7/10

    6

    The first challenge in data integration is to recognize that the same product exists in each

    of the two sources - the process of linking and the second challenge is to combine the

    data into a single view of the product the process of consolidation.

    With customer data, there is often a common field, such as a tax identification number,

    that can help identify commonality. When this occurs, companies can identify multiple

    records for the same customer quickly and easily.

    With product data, this is often not the case. In the earlier example, data integration

    technology enabled the company to identify the product information for linking across the

    two different systems and consolidate the data.

    Intelligent fuzzy matching identified a product description that was common to both files.

    The USA file contained the description, brand name and product identifier all in one field

    and in various patterns. The European file contained only product descriptions, also with

    varied patterns and abbreviations. To uncover the connection, data integration

    technology was used to:

    Parse the description from the USA file into product specific attributes andinto a brand name

    Reconcile the differences in brand names Reconcile the differences in product attributes (short forms, abbreviations,

    etc.)

    Phonetically match the reconciled data Display reports of matching products

    After data integration, the company now possessed a comprehensive view of all the data

    know about a subject. The same principles and techniques may be applied to any type of

    data to create integrated master records.

    The next step is to increase the datas value by enhancing it with additional information.

    Data Enrichment

    Data enrichment entails incorporating additional external data to add value to existing

    records. There are multiple ways this may occur. Data enrichment may mean enhancing

    corporate data with third-party data to increase an understanding of the customer and

    their buying potential and loyalty. Or, advanced geocoding to add enriched geographic

    information such as being able to know that all the houses in a particular ZIP-code-plus-

    4 area were built after 1980 and exceed a certain property value provides information

    to target certain product offerings.

    Intelligent fuzzymatchingtechnology cangreatly simplifydata integrationprojects.

    Enrichingcorporate data bycombining it withdata from trustedthird-party sourcescan add value .

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    8/10

    7

    An understanding of the behavior of customers can also be attained with analysis of

    certain attributes. By combining that data with specific customer data, one could segment

    customers more effectively to identify specific opportunities.

    Figure 4 illustrates how data improvement technology can add crucial components to a

    regular street address by matching the submitted record to external data sources.

    Submitted Data Standardized DataDataFlux DataFlux Corporation

    940 Cary Pkwy 940 Cary Pkwy, Ste. 201Cary, NC Cary, NC 27513-2792

    New Data Points AddedCarrier Route ID 34

    Delivery Point ID 1

    County Number 183

    County Name Wake

    Congressional District 4

    Latitude 35.8306

    Longitude -78.7841

    Census Block Group 371830535122

    Figure 4 - Data enrichment on a street address.

    Data enrichment can be used for more than just address data. Many types of data,

    including product, inventory and financial data, can have value added by connecting the

    data to external sources.

    This can mean appending industry-standard product codes from registries like UNSPSC or

    eCl@ss to standardize product or inventory records and streamline buying and selling

    goods. Integrating your database with these universal codes can guarantee the

    correctness of inventory, increase efficiency in ordering and help pave the way for smooth

    international operations.

    Data enrichment also helps manage corporate compliance issues, by comparing customerlists or transactions to external lists in accordance with government or industry

    regulations. This can be used to conduct anti-money laundering checks, enact fraud

    detection or integrate do-not-call registries.

    Data enrichmentcan be used to addcommodity codingstandards, such aseCl@ss orUNSPSC.

    Data enrichmentaids in complianceinitiatives byintegrating watchlist and do-not-callinformation.

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    9/10

    8

    Data Monitoring

    Data monitoring is essential to a complete data governance program, ensuring that hard-

    won data improvement end enrichment efforts aren't degraded by the creeping return of

    data errors. Active data monitoring can help companies understand the condition of their

    data, and to isolate and correct the causes of data quality issues.

    Data monitoring can take a number of forms. The simplest version of data monitoring

    automates a profiling report - a periodic analysis of the data that searches for exceptions

    and non-standard data - trigger events, such as too great a percentage of exceptional

    data, would generate an email or a system alert to the individuals or department

    responsible for the data.

    Another, more dynamic - and more effective - method of data monitoring is to actively

    enforce business rules as web services. Using this method, the same rules which were

    used to cleanse and enhance the data can be applied to data in real time as it enters and

    moves through the enterprise. The advantages of this approach include the ability to:

    Enforce data governance rules Ensure business-specific rules for howorganizational data should be managed and handled are followed.

    Understand and refine mission-critical processes Violations or exceptionsto business rules are logged in a repository, to track and address trends.

    Invoke events to correct the data The monitoring engine can invoke eventsto automatically correct data, email business users, log the data to the

    repository, or simply write the data to a database table.

    Data monitoringallows businessrules developedthrough theprocess to bedeployed asongoingsafeguards againstdeclining dataquality.

  • 8/14/2019 5 Steps to More Valuable Enterprise Data

    10/10

    9

    Summary

    The quality of any business analysis is only as good as the data at its foundation. Without

    data that is consistent, accurate and reliable across the enterprise, an organization can

    easily reach misleading, faulty and potentially harmful conclusions.

    An organizations success in data analysis begins with finding a methodology and solution

    that encompasses each of these building blocks data profiling, data quality, data

    integration, data enrichment and data monitoring.

    Todays powerful new data quality and data integration technology allows organizations to

    support data-driven initiatives on an unprecedented level, providing a unified view of

    customer, product, supplier and other data assets on a wide variety of applications across

    the enterprise.

    Through the combination of an intelligent methodology and appropriate technology, an

    organization can elevate the role of data quality to an ongoing corporate activity, enabling

    better business decisions and competitive advantage, all based on high-quality

    information.

    Analyze, improveand controlcorporate data.