table29 data validation 95
DESCRIPTION
Informatica DVOTRANSCRIPT
1
2
Data Validation Overview
3
What is Data Validation?
Identifying errors in data sets that have
been Moved or Transformed to ensure
they are Complete and Accurate and
meet Expectations or Requirements.
4
How most companies test today
• Many companies perform testing manually by writing SQL scripts, using Excel, or hand coding testing logic into their integration processes
• Reconciliation is done manually (if done at all)− basic SQL scripts (row counts, aggregates) or manually written mappings/logic
• Customers estimate data testing SHOULD take 25-30% of all hours spent on Data Integration− Most customers admit they do not do enough data validation, resulting in
poorer data quality and higher project risk
• PowerCenter upgrades can take up to weeks or months to complete due to manual testing effort− It takes one day to upgrade the ETL software
5
Problems with Manual Testing• Takes a long time and is expensive
− Time is spent writing queries and waiting for them to run and then searching through the results
• Error-prone manual process− “Stare and compare”
• Cannot perform thorough testing− Time/Cost pressure leads to “try it here and there” approach− Testing ends when the deadline is reached, done or not
• Usual problems associated with writing custom code− No audit trail− No reuse− No methodology
6
Current Approach: Like a Photo Hunt
7
Current Approach: Stare and Compare
Data Set #1 Data Set #2
8
What is the Data Validation Option?
DVO is a independent (black box)
testing solution that provides
automation, repeatability and
auditability to virtually any data
testing or reconciliation process.
9
Some DVO use cases Data Being Transformed
• ETL Reconciliation• Data Masking• ETL Testing• Application Migration
Data is Identical• ETL version upgrade• ETL Migration• Database migration• Application Retirement
10
Two Value Propositions for DVO
Production Reconciliation
Protect the integrity of data that is loaded into
production systems.
Erroneous data due to failed loads, faulty logic or operational issues is caught in a proactiveautomated manner and can be
addressed as needed
Development & Test
Provide automation for unit and regression testing of integration logic.
Ensure that data produced by DI code meets requirements and
expectations
Ensure the integrity of data as it moves throughthe IT environment.
11
Two Value Propositions for DVOEnsure the integrity of data as it moves through
the IT environment.
12
How DVO works with PowerCenter
Data Validation OptionRepository
& WarehouseDVO ClientsReportsDatabase
Views
Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float
V_Summary
Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float
V_Tests
Id: namename: stringPrice: integerDate in: dateDate out: dateSalary: float
V_Results
Enterprise Data
Repository and Integration Services
Repository
PowerCenter
Define Tests
Execute TestsData
Accessed
Results
13
Key Features of DVO• Broad data connectivity
• DBMS (Oracle, SQL Server, DB2, Sybase, Teradata, Netezza)• Mainframe (DB2 z/OS, DB2 AS/400, IMS, Adabas, MF Flat files, VSAM)• SalesForce.com , SAP transparent tables, SAS, ODBC and Flat files
• Numerous built-in tests• COUNT, COUNT_DISTINCT, COUNT_ROWS, MIN, MAX, AVG, SUM• SET AinB, SET BinA, SET AeqB• VALUE, OUTER VALUE, Expressions
• Model ETL constructs• LOOKUPs, Arbitrary SQL Relationships
• Other• Run from GUI or CLI (DVOCmd)• Built-in reporting
14
Comparing DVO with Manual Testing
15
• Have created a test suite of over 1000 Tests
• Testers can manage the testing environment
• Can test large volumes of data• Testing time reduced from 1
week to 1 day (80% less)• Spend “free time” on higher
level tasks
SAAS provider of Sales Compensation and analytics• Data absolutely has to be correct as it affects peoples’
paychecks• Very high visibility of the data with users• Trust in the data is key
KEY BUSINESS IMPERATIVE AND IT INITIATIVE
INFORMATICA ADVANTAGE RESULTS/BENEFITS• New release every ~1 month• 1 Full week of data testing
by QA team per release• Developers wrote SQL for
testing the data• Testers would execute the
SQL, track errors and work with Developers to resolve
• And who was testing the SQL to make sure it was correct?
THE CHALLENGE
Technology CompanyReduced data testing time by 80%
with Data Validation Option
• With DVO they are able to test 100,000s rows of data in regression tests
• Developers no longer required to write SQL
• Testers are now empowered and independent of developers
Informatica Confidential – Under NDA
Development and Test
16
• DVO found where data was missing
• Found thousands of missing records due to bad coding, & improperly rerun failed jobs
• Reloaded all missing data in two weeks
• They are looking to implement ongoing incremental validation for all new data loaded into tables
Good data is essential to good business decisions. Their calculations of portfolio risk and value must be correct.
• Spends “hundreds of millions” purchasing troubled debt in the USA• The data and risk calculations on those assets must be correct.• Bad data could cost them “millions” and put them out of business.
KEY BUSINESS IMPERATIVE AND IT INITIATIVE
INFORMATICA ADVANTAGE RESULTS/BENEFITS• Business users were
complaining about missing data in the systems.
• Data errors can lead to very costly bad business decisions.
• They were doing manual testing via developer-written mappings and PL/SQL
• Other products available today could not meet their requirements
THE CHALLENGE
Financial Services CompanyEnsures DW is Complete and Accurate
with Data Validation Option
• With DVO they are able to perform detailed reconciliations across source and target systems.
• With DVO, they have a complete audit trail.
Informatica Confidential – Under NDA
Production Reconciliation
17
• Identified errors due to faulty DI logic, and error handling process
• Ensured incorrect records no longer being use in marketing campaigns
• Bad customer data no longer reaching customer in portal
Customer and contact hub is pivotal to efficient business operationsMillions of records processed across various systemsEnsure BAs, line managers and customers had access to accurate and complete data based on their needs
KEY BUSINESS IMPERATIVE AND IT INITIATIVE
INFORMATICA ADVANTAGE RESULTS/BENEFITS• No easy way to reconcile
data in systems to identify bad data or identify extent of errors
• Incorrectly augmented data in systems
• Gold record data didn’t always match across systems
• Faulty records propagated downstream.
THE CHALLENGE
Mid-size Technology CompanyReconciling MDM data using
Data Validation Option
• DVO reconciled data across systems (e.g. SalesForce and Hub) and found:
• 1000s of missing records between systems
• Incorrectly augmented D&B data
• Improperly coded golden records
Informatica Confidential – Under NDA
Production Reconciliation
18