enterprise data quality dashboards and alerts: holistic data quality jay zaidi bonnie o’neil
DESCRIPTION
Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011. Agenda. 1. Introduction. 2. Data Quality Challenges and Opportunities. 3. - PowerPoint PPT PresentationTRANSCRIPT
Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality
Jay ZaidiBonnie O’Neil(Fannie Mae)
Data Governance Winter ConferenceFt. Lauderdale, FloridaNovember 16-18, 2011
Page 2
Agenda
11 IntroductionIntroduction
33 Holistic Data Quality (HDQ)Holistic Data Quality (HDQ)
44 Enterprise Data Quality Solutions ArchitectureEnterprise Data Quality Solutions Architecture
55 Enterprise Data Quality Dashboard ExampleEnterprise Data Quality Dashboard Example
22 Data Quality Challenges and OpportunitiesData Quality Challenges and Opportunities
Page 3
Meet the Authors – Jay Zaidi
Enterprise Data Quality Program Lead, Fannie Mae
15+ years in Enterprise Data Management and Solution Architecture
Specialized in Financial Services and Healthcare domains
Contact: 202-590-3131 [email protected]
Page 4
Meet the Authors – Bonnie O’Neil
Technical Data Architect, Fannie Mae
20+ years as a Data Architect
Author: 3 books– Most recent: Business
MetadataAuthor, over 50 articles &
white papers
Page 5
Data Quality Management – Challenges and OpportunitiesData SilosData Silos
Data Volumes and VelocityData Volumes and Velocity
Complex Data ArchitecturesComplex Data Architectures
Real Time Enterprise RequirementsReal Time Enterprise Requirements
Lack of AccountabilityLack of Accountability
Reactive ModeReactive Mode
Lack of Straight Through ProcessingLack of Straight Through Processing
Structured and Unstructured Data
(email, video, logs, system events etc)
Structured and Unstructured Data
(email, video, logs, system events etc)
High level of maturity in Data Quality Management is required to address operational challenges.
“Holistic Data Quality (HDQ)”“Holistic Data Quality (HDQ)”
Data Optimization and ScalabilityData Optimization and Scalability
Simplify Data ArchitectureSimplify Data Architecture
Real Time Data Quality MonitoringReal Time Data Quality Monitoring
Strong Data GovernanceStrong Data Governance
Proactive Data Quality ControlsProactive Data Quality Controls
Automated controls and monitoringAutomated controls and monitoring
Leverage “Big Data” SolutionsLeverage “Big Data” Solutions
Page 6
The Data Quality Maturity Journey
STEP ONE STEP TWO STEP THREEFOUNDATION &FRAMEWORK
CONSTRUCTING THE RAILROAD
EXECUTION
• DQ Use Cases• Solution Architecture• Industry Tool Selection• Consistent DQ Definitions
• Tool Deployment• Reporting Capabilities• Training• Communication
• Change Management• Awareness• Proactive DQ Controls• DQ Continuous
Improvement• DQ Services
Robust data quality management is required to support Regulatory Compliance, Risk Management, Accounting, Financial reporting and other business functions.
Page 7
The Data Architecture Spaghetti
How do you manage the quality of business critical data in a dynamic and highly complex environment?
TransactionalStore Data Mart
TransactionalStore
Data Warehouses
OperationalData Store
OperationalData Store
Data Mart
Department OneDepartment Three
Department Two
Diagram by Arnon Rotem-Gal-Oz, April 2007
Page 8
The Information Supply Chain
Diagram by George Marinos - The Information Supply Chain: Achieving Business Objectives by Enhancing Critical Business Processes, April 2005
Each link of the information supply chain is dependant on the other – strong controls are needed to manage business critical data.
Transparency into quality
across supply chain
Page 9
Guiding Principles
Identify and address data quality issues at point on entry into eco-system Externalize data quality rules from code (rules engine, calculation libraries,
derivation logic, etc with governance and controls) Manage enterprise critical data at the enterprise level (ent. Dg, ent. Dq group) and
line of business data at local level (local dg and dq) Measure quality of data at systems of record and critical stores, compare against
thresholds and tolerances and remediate proactively EDQ team will monitor and manage
Page 10
Data Quality Maturity
Page 11
Data Quality Use Cases
Process Externally supplied data Reconcile data between data stores or data store and files Certify the quality of data Score the quality of data Identify data anomalies in data (db, files, xml, etc.)
Page 12
Data Quality Toolkit
DQ Standards and Policies DQ Methodology DQ Dimensional Framework DQ Development and Support Model (roles, responsibilities, deliverables by team across the SDLC life cycle) DQ Best Practices Data Quality Requirements Template Data Quality Metrics Template DQ tasks inside SDLC Methodology DQ Solution Architecture DQ Training Documentation DQ Business Case Deck with elevator speech Governance structure – custodians, trustees, stewards, business data lead Map of critical data, SOR’s, custodian, trustee, bdl, Project plan activities related to a DQ project On-boarding documentation for tools, dashboards etc DQ Deployment Model (Centralized vs Federated vs. Hybrid) Lessons Learned/Challenges you will hit Change Management Plan Stakeholder Communication Plan DQ Charter, Strategy, Approach, Sponsorship DQ Case Studies – business value add Synergy between DQ and DG Organizational structure
Page 13
Conceptual Solution Architecture
Page 14
Deployment Models
Central vs Federated
Page 15
Challenges You Will Face and Your Response
Page 16
Typical Business Scenario
Analyze Data andConduct Forensics(Data Quality Tool)
Implement Real TimeData Quality using DQ Services
(Data Quality Tool)
Identify anomalies and remediate issues
(Data Quality Tool andEDQ Dashboard)
Internally or Externally Supplied data
Enterprise ApplicationsReports & Executive
Dashboards
Enterprise Data Stores(Transactional, Operational, Marts and
Warehouses)
The Enterprise Data Quality Platform provides the tools, methodologies and best practices to identify and remediate data quality issues.
Page 17
Issue Logging and Resolution
Page 18
Holistic Data Quality
Our focus should be on addressing systemic issues. This requires a switch from “reactive” to “proactive” approaches to data quality and quality that is not evaluated or managed in silos, but addressed using a holistic cross-silo approach. “Holistic Data Quality (HDQ)” is the
term that I have coined to address this need.
– Jay Zaidi
Our focus should be on addressing systemic issues. This requires a switch from “reactive” to “proactive” approaches to data quality and quality that is not evaluated or managed in silos, but addressed using a holistic cross-silo approach. “Holistic Data Quality (HDQ)” is the
term that I have coined to address this need.
– Jay Zaidi
Implementing HDQ at the enterprise level is a strategic, multi-year effort for mid to large-sized firms. If done right - the return on investment is many fold.
Page 19
Do Not Boil The Ocean
General population of data elements*
Critical data for the enterprise*
(“Enterprise Critical”)
Initial Focus should be on “Enterprise Critical” data
Critical data for a line of business* (“LOB
Critical”)
Narrowing the scope of the effort will ensure success
Identify data critical for the enterprise
10,000 to 20,000
2,000 to 3,000
400 to 500
Enterprise level governance and quality efforts should focus on Enterprise Critical data. Lines of business should govern and manage the quality of their business
critical data.
* Estimates Only
Page 20
Dimensions of Data Quality
The concept of Dimensions of Data Quality has been established by many authors in the industry, such as David Loshin and Danette McGilvray:
“To be able to correlate data quality issues to business impacts, we must be able to both classify our data quality expectations as well as our business impact
criteria.” -David Loshin
Dimensions are facets or specific measurements of data quality, pertaining to specific data elements
The authors propose many variations but the main ones that most agree on are:– Accuracy– Conformity– Completeness– Consistency/Duplication– Timeliness (sometimes called Currency)– Integrity
Data Quality Dimensions facilitate the consistent definition of data quality requirements and metrics across various organizations.
Page 21
Data Quality Development and Support Model
Page 22
Business Intelligence for Enterprise Data Quality
Business intelligence tool (COTS) Data quality Commercial-off-the-shelf (COTS) product Data quality data mart (custom) Data quality issue management system Extract Transform and Load (ETL) product Enterprise Service Bus (SOA and Data Quality Services)
Data Quality Mart
Data QualityResults
Data Quality Rules
ETL
Data Quality Tool
(Profiling/Rule Execution)
Data Stores
BusinessIntelligence
Tool
EnterpriseDashboard
Files
SOLUTION COMPONENTS
Page 23
Replace Paper Reports with Business Intelligence
Operational Incidents
Data Quality Issues Report
Weekly Data ManagementStatus Reports
Audit Findings
Regulatory ComplianceIssues
Replace mounds of paper with a business intelligence solution – gain access to summary and detailed information on key quality indicators on-demand.
Page 24
ENTERPRISE DATA QUALITY DASHBOARD(Enterprise View)
WHOLESALE RETAIL COMMERCIAL
QUALITY BY LINE OF BUSINESS
WHOLESALE RETAIL COMMERCIAL
DATA QUALITY MATURITYCRITICAL DATABREAKDOWN
RELEASE 1
RELEASE 2
TRENDING OF DATA QUALITY
REGIONAL TRENDCUSTOMER DATAPRODUCT DATA
HEALTH INDICATORS QUALITY RATING FOR EACH DATA ELEMENTOVERALLHEALTH
Page 25
ENTERPRISE DATA QUALITY DASHBOARD(Retail Business View)
RELEASE 1
TRENDING OF DATA QUALITY
DATA STORE TRENDLOAN DATABORROWER DATA
HEALTH INDICATORS
QUALITY RATING FOR EACH LOB DATA ELEMENT
OVERALLHEALTH
RELEASE 2
CRITICAL DATABREAKDOWN
DATA QUALITY SERVER UTILIZATION
Page 26
Continuously Measure and Improve Quality
Define the scope, goal, budget,
duration and the data quality
problem to be addressed.
All relevant data quality
statistics and measures
important to the enterprise
are collected at this stage.
Analysis of the data collected in the previous phase is
conducted and root cause(s) identified.
Data remediation is
implemented to improve the quality of data.
Monitor the quality after remediation to ensure that data is defect free. If there are any further changes to be made, the team makes changes and again measures the quality.
Step 1 - Define
Step 4 - Control
Step 2 - Measure
Step 3 - Analyze and
Improve
The Enterprise Data Quality dashboard provides transparency into data quality hotspots that must be addressed proactively.
Page 27
Lessons Learned
Changing behavior is hard – so use a carrot and stick approach to get people to change
Recognize team members that display the expected behavior and highlight what they did
Roll out the data quality platform (tools, methodologies, best practices) in a phased manner
Educate team members at all levels of the enterprise on the value of strong governance and data quality
Facilitate adoption of the tools and business intelligence offerings by providing them to all organizations free of cost or at a very low cost
Highlight the fact that data is “owned” by the enterprise and not by a particular individual or line of business
Hold people accountable by using operational metrics, data quality metrics and compliance metrics to make your case
Measure the hard savings and business value added by the program and communicate up and down the chain on a regular basis (KPI Dashboard)
Page 28
Summary
Effective data management provides order out of chaos Implementing “Holistic Data Quality” provides transparency into
data quality issues across the information supply chain and helps in identifying systemic issues
Focus must be on “Enterprise Critical” data initially. Do not try to boil the ocean.
The solution architecture’s core components are the data quality COTS product, a data quality Data Mart and a Business Intelligence tool
Proactive monitoring and measurement of data quality, coupled with an alerting mechanism, significantly reduces operational incidents
Implementing HDQ is a strategic initiative and requires C-level sponsorship and support
Page 29
Questions!!
Page 30
Typical Current State Data Flow
Data Warehouse
External Data Feeds
External Data Feeds
Data Marts
Potential data quality problem
The current siloed approach to data management is wasteful and doesn’t provide transparency into systemic issues.
Transactional and Operational Stores
Page 31
Future State Data Flow: Continuous Data Quality Monitoring
Transactional and Operational Stores
External Data Feeds
External Data Feeds
Data Marts
DQ Monitoring
Enterprise Data Architecture should enable straight through processing and offeroperational efficiencies.
Data Warehouse
Page 32
Key Process Steps
For each data element that you will monitor do the following (use a template):
– Identify the trustee and custodian (DG/DQ)
– Identify the system of record (DG/DQ)
– Identify the dimensions of data quality that apply (Custodian/Trustee/DQ)
– Capture the data quality rules per dimension (Custodian/Trustee/DQ)
– Capture the frequency of rule execution (Custodian/Trustee)
– Capture the data quality thresholds and tolerances for Red/Yellow/Green status (Custodian/Trustee)
– Capture the key metrics that you wish to capture (Custodian/Trustee)
– Conduct the logical to physical data mapping (to the data source) for the data element (DQ/Technology)
Page 33
Dimensions of Data Quality - Explanation
There are a dozen or more Data Quality Dimensions that can be defined, but organizations should pick the ones that best meet their needs.
Accuracy: How much does the data conform to the real world?Accuracy: How much does the data conform to the real world?
Conformity: How much does the data conform to formats and
domain values?
Conformity: How much does the data conform to formats and
domain values?
Integrity: Does the data conform to integrity rules
appropriately? Are relationships between elements retained?
Integrity: Does the data conform to integrity rules
appropriately? Are relationships between elements retained?
Completeness: How much required data is missing?
Completeness: How much required data is missing?
Duplication: Does the same data exist in multiple systems? If so, is it represented the same?
Duplication: Does the same data exist in multiple systems? If so, is it represented the same?
Currency: How current is the data? When was it last entered
or refreshed?
Currency: How current is the data? When was it last entered
or refreshed?