enterprise data quality dashboards and alerts: holistic data quality jay zaidi bonnie o’neil

33
terprise Data Quality Dashboards and Alerts Holistic Data Quality Jay Zaidi Bonnie O’Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011

Upload: renee-bradshaw

Post on 31-Dec-2015

59 views

Category:

Documents


0 download

DESCRIPTION

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011. Agenda. 1. Introduction. 2. Data Quality Challenges and Opportunities. 3. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality

Jay ZaidiBonnie O’Neil(Fannie Mae)

Data Governance Winter ConferenceFt. Lauderdale, FloridaNovember 16-18, 2011

Page 2: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 2

Agenda

11 IntroductionIntroduction

33 Holistic Data Quality (HDQ)Holistic Data Quality (HDQ)

44 Enterprise Data Quality Solutions ArchitectureEnterprise Data Quality Solutions Architecture

55 Enterprise Data Quality Dashboard ExampleEnterprise Data Quality Dashboard Example

22 Data Quality Challenges and OpportunitiesData Quality Challenges and Opportunities

Page 3: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 3

Meet the Authors – Jay Zaidi

Enterprise Data Quality Program Lead, Fannie Mae

15+ years in Enterprise Data Management and Solution Architecture

Specialized in Financial Services and Healthcare domains

Contact: 202-590-3131 [email protected]

Page 4: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 4

Meet the Authors – Bonnie O’Neil

Technical Data Architect, Fannie Mae

20+ years as a Data Architect

Author: 3 books– Most recent: Business

MetadataAuthor, over 50 articles &

white papers

Page 5: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 5

Data Quality Management – Challenges and OpportunitiesData SilosData Silos

Data Volumes and VelocityData Volumes and Velocity

Complex Data ArchitecturesComplex Data Architectures

Real Time Enterprise RequirementsReal Time Enterprise Requirements

Lack of AccountabilityLack of Accountability

Reactive ModeReactive Mode

Lack of Straight Through ProcessingLack of Straight Through Processing

Structured and Unstructured Data

(email, video, logs, system events etc)

Structured and Unstructured Data

(email, video, logs, system events etc)

High level of maturity in Data Quality Management is required to address operational challenges.

“Holistic Data Quality (HDQ)”“Holistic Data Quality (HDQ)”

Data Optimization and ScalabilityData Optimization and Scalability

Simplify Data ArchitectureSimplify Data Architecture

Real Time Data Quality MonitoringReal Time Data Quality Monitoring

Strong Data GovernanceStrong Data Governance

Proactive Data Quality ControlsProactive Data Quality Controls

Automated controls and monitoringAutomated controls and monitoring

Leverage “Big Data” SolutionsLeverage “Big Data” Solutions

Page 6: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 6

The Data Quality Maturity Journey

STEP ONE STEP TWO STEP THREEFOUNDATION &FRAMEWORK

CONSTRUCTING THE RAILROAD

EXECUTION

• DQ Use Cases• Solution Architecture• Industry Tool Selection• Consistent DQ Definitions

• Tool Deployment• Reporting Capabilities• Training• Communication

• Change Management• Awareness• Proactive DQ Controls• DQ Continuous

Improvement• DQ Services

Robust data quality management is required to support Regulatory Compliance, Risk Management, Accounting, Financial reporting and other business functions.

Page 7: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 7

The Data Architecture Spaghetti

How do you manage the quality of business critical data in a dynamic and highly complex environment?

TransactionalStore Data Mart

TransactionalStore

Data Warehouses

OperationalData Store

OperationalData Store

Data Mart

Department OneDepartment Three

Department Two

Diagram by Arnon Rotem-Gal-Oz, April 2007

Page 8: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 8

The Information Supply Chain

Diagram by George Marinos - The Information Supply Chain: Achieving Business Objectives by Enhancing Critical Business Processes, April 2005

Each link of the information supply chain is dependant on the other – strong controls are needed to manage business critical data.

Transparency into quality

across supply chain

Page 9: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 9

Guiding Principles

Identify and address data quality issues at point on entry into eco-system Externalize data quality rules from code (rules engine, calculation libraries,

derivation logic, etc with governance and controls) Manage enterprise critical data at the enterprise level (ent. Dg, ent. Dq group) and

line of business data at local level (local dg and dq) Measure quality of data at systems of record and critical stores, compare against

thresholds and tolerances and remediate proactively EDQ team will monitor and manage

Page 10: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 10

Data Quality Maturity

Page 11: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 11

Data Quality Use Cases

Process Externally supplied data Reconcile data between data stores or data store and files Certify the quality of data Score the quality of data Identify data anomalies in data (db, files, xml, etc.)

Page 12: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 12

Data Quality Toolkit

DQ Standards and Policies DQ Methodology DQ Dimensional Framework DQ Development and Support Model (roles, responsibilities, deliverables by team across the SDLC life cycle) DQ Best Practices Data Quality Requirements Template Data Quality Metrics Template DQ tasks inside SDLC Methodology DQ Solution Architecture DQ Training Documentation DQ Business Case Deck with elevator speech Governance structure – custodians, trustees, stewards, business data lead Map of critical data, SOR’s, custodian, trustee, bdl, Project plan activities related to a DQ project On-boarding documentation for tools, dashboards etc DQ Deployment Model (Centralized vs Federated vs. Hybrid) Lessons Learned/Challenges you will hit Change Management Plan Stakeholder Communication Plan DQ Charter, Strategy, Approach, Sponsorship DQ Case Studies – business value add Synergy between DQ and DG Organizational structure

Page 13: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 13

Conceptual Solution Architecture

Page 14: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 14

Deployment Models

Central vs Federated

Page 15: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 15

Challenges You Will Face and Your Response

Page 16: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 16

Typical Business Scenario

Analyze Data andConduct Forensics(Data Quality Tool)

Implement Real TimeData Quality using DQ Services

(Data Quality Tool)

Identify anomalies and remediate issues

(Data Quality Tool andEDQ Dashboard)

Internally or Externally Supplied data

Enterprise ApplicationsReports & Executive

Dashboards

Enterprise Data Stores(Transactional, Operational, Marts and

Warehouses)

The Enterprise Data Quality Platform provides the tools, methodologies and best practices to identify and remediate data quality issues.

Page 17: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 17

Issue Logging and Resolution

Page 18: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 18

Holistic Data Quality

Our focus should be on addressing systemic issues. This requires a switch from “reactive” to “proactive” approaches to data quality and quality that is not evaluated or managed in silos, but addressed using a holistic cross-silo approach. “Holistic Data Quality (HDQ)” is the

term that I have coined to address this need.

– Jay Zaidi

Our focus should be on addressing systemic issues. This requires a switch from “reactive” to “proactive” approaches to data quality and quality that is not evaluated or managed in silos, but addressed using a holistic cross-silo approach. “Holistic Data Quality (HDQ)” is the

term that I have coined to address this need.

– Jay Zaidi

Implementing HDQ at the enterprise level is a strategic, multi-year effort for mid to large-sized firms. If done right - the return on investment is many fold.

Page 19: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 19

Do Not Boil The Ocean

General population of data elements*

Critical data for the enterprise*

(“Enterprise Critical”)

Initial Focus should be on “Enterprise Critical” data

Critical data for a line of business* (“LOB

Critical”)

Narrowing the scope of the effort will ensure success

Identify data critical for the enterprise

10,000 to 20,000

2,000 to 3,000

400 to 500

Enterprise level governance and quality efforts should focus on Enterprise Critical data. Lines of business should govern and manage the quality of their business

critical data.

* Estimates Only

Page 20: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 20

Dimensions of Data Quality

The concept of Dimensions of Data Quality has been established by many authors in the industry, such as David Loshin and Danette McGilvray:

“To be able to correlate data quality issues to business impacts, we must be able to both classify our data quality expectations as well as our business impact

criteria.” -David Loshin

Dimensions are facets or specific measurements of data quality, pertaining to specific data elements

The authors propose many variations but the main ones that most agree on are:– Accuracy– Conformity– Completeness– Consistency/Duplication– Timeliness (sometimes called Currency)– Integrity

Data Quality Dimensions facilitate the consistent definition of data quality requirements and metrics across various organizations.

Page 21: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 21

Data Quality Development and Support Model

Page 22: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 22

Business Intelligence for Enterprise Data Quality

Business intelligence tool (COTS) Data quality Commercial-off-the-shelf (COTS) product Data quality data mart (custom) Data quality issue management system Extract Transform and Load (ETL) product Enterprise Service Bus (SOA and Data Quality Services)

Data Quality Mart

Data QualityResults

Data Quality Rules

ETL

Data Quality Tool

(Profiling/Rule Execution)

Data Stores

BusinessIntelligence

Tool

EnterpriseDashboard

Files

SOLUTION COMPONENTS

Page 23: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 23

Replace Paper Reports with Business Intelligence

Operational Incidents

Data Quality Issues Report

Weekly Data ManagementStatus Reports

Audit Findings

Regulatory ComplianceIssues

Replace mounds of paper with a business intelligence solution – gain access to summary and detailed information on key quality indicators on-demand.

Page 24: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 24

ENTERPRISE DATA QUALITY DASHBOARD(Enterprise View)

WHOLESALE RETAIL COMMERCIAL

QUALITY BY LINE OF BUSINESS

WHOLESALE RETAIL COMMERCIAL

DATA QUALITY MATURITYCRITICAL DATABREAKDOWN

RELEASE 1

RELEASE 2

TRENDING OF DATA QUALITY

REGIONAL TRENDCUSTOMER DATAPRODUCT DATA

HEALTH INDICATORS QUALITY RATING FOR EACH DATA ELEMENTOVERALLHEALTH

Page 25: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 25

ENTERPRISE DATA QUALITY DASHBOARD(Retail Business View)

RELEASE 1

TRENDING OF DATA QUALITY

DATA STORE TRENDLOAN DATABORROWER DATA

HEALTH INDICATORS

QUALITY RATING FOR EACH LOB DATA ELEMENT

OVERALLHEALTH

RELEASE 2

CRITICAL DATABREAKDOWN

DATA QUALITY SERVER UTILIZATION

Page 26: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 26

Continuously Measure and Improve Quality

Define the scope, goal, budget,

duration and the data quality

problem to be addressed.

All relevant data quality

statistics and measures

important to the enterprise

are collected at this stage.

Analysis of the data collected in the previous phase is

conducted and root cause(s) identified.

Data remediation is

implemented to improve the quality of data.

Monitor the quality after remediation to ensure that data is defect free. If there are any further changes to be made, the team makes changes and again measures the quality.

Step 1 - Define

Step 4 - Control

Step 2 - Measure

Step 3 - Analyze and

Improve

The Enterprise Data Quality dashboard provides transparency into data quality hotspots that must be addressed proactively.

Page 27: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 27

Lessons Learned

Changing behavior is hard – so use a carrot and stick approach to get people to change

Recognize team members that display the expected behavior and highlight what they did

Roll out the data quality platform (tools, methodologies, best practices) in a phased manner

Educate team members at all levels of the enterprise on the value of strong governance and data quality

Facilitate adoption of the tools and business intelligence offerings by providing them to all organizations free of cost or at a very low cost

Highlight the fact that data is “owned” by the enterprise and not by a particular individual or line of business

Hold people accountable by using operational metrics, data quality metrics and compliance metrics to make your case

Measure the hard savings and business value added by the program and communicate up and down the chain on a regular basis (KPI Dashboard)

Page 28: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 28

Summary

Effective data management provides order out of chaos Implementing “Holistic Data Quality” provides transparency into

data quality issues across the information supply chain and helps in identifying systemic issues

Focus must be on “Enterprise Critical” data initially. Do not try to boil the ocean.

The solution architecture’s core components are the data quality COTS product, a data quality Data Mart and a Business Intelligence tool

Proactive monitoring and measurement of data quality, coupled with an alerting mechanism, significantly reduces operational incidents

Implementing HDQ is a strategic initiative and requires C-level sponsorship and support

Page 29: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 29

Questions!!

Page 30: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 30

Typical Current State Data Flow

Data Warehouse

External Data Feeds

External Data Feeds

Data Marts

Potential data quality problem

The current siloed approach to data management is wasteful and doesn’t provide transparency into systemic issues.

Transactional and Operational Stores

Page 31: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 31

Future State Data Flow: Continuous Data Quality Monitoring

Transactional and Operational Stores

External Data Feeds

External Data Feeds

Data Marts

DQ Monitoring

Enterprise Data Architecture should enable straight through processing and offeroperational efficiencies.

Data Warehouse

Page 32: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 32

Key Process Steps

For each data element that you will monitor do the following (use a template):

– Identify the trustee and custodian (DG/DQ)

– Identify the system of record (DG/DQ)

– Identify the dimensions of data quality that apply (Custodian/Trustee/DQ)

– Capture the data quality rules per dimension (Custodian/Trustee/DQ)

– Capture the frequency of rule execution (Custodian/Trustee)

– Capture the data quality thresholds and tolerances for Red/Yellow/Green status (Custodian/Trustee)

– Capture the key metrics that you wish to capture (Custodian/Trustee)

– Conduct the logical to physical data mapping (to the data source) for the data element (DQ/Technology)

Page 33: Enterprise Data Quality Dashboards and Alerts:  Holistic Data Quality Jay Zaidi Bonnie O’Neil

Page 33

Dimensions of Data Quality - Explanation

There are a dozen or more Data Quality Dimensions that can be defined, but organizations should pick the ones that best meet their needs.

Accuracy: How much does the data conform to the real world?Accuracy: How much does the data conform to the real world?

Conformity: How much does the data conform to formats and

domain values?

Conformity: How much does the data conform to formats and

domain values?

Integrity: Does the data conform to integrity rules

appropriately? Are relationships between elements retained?

Integrity: Does the data conform to integrity rules

appropriately? Are relationships between elements retained?

Completeness: How much required data is missing?

Completeness: How much required data is missing?

Duplication: Does the same data exist in multiple systems? If so, is it represented the same?

Duplication: Does the same data exist in multiple systems? If so, is it represented the same?

Currency: How current is the data? When was it last entered

or refreshed?

Currency: How current is the data? When was it last entered

or refreshed?