automate data warehouse etl testing and migration testing the agile way
TRANSCRIPT
Fast forward Data
Warehouse/ETL Testing Migration
TestingThe Agile
Way
ETL Testing & Monitoring Platform
Author: Sandesh GawandeCTO -iCEDQTorana, Inc.Email: [email protected]: 203 666 4442Twitter: @sandesh_gawandeSkype: Sandesh.Gawande.ToranaIncLinkedIn: https://www.linkedin.com/in/sandesh-gawande-1a25757
About: Torana, Inc. Year established 2005 Stamford, CT Developers of Big Data Integration &
Data Migration, Data warehouse/ETL Testing Software
Fortune 500 customers in Banking, Insurance, Healthcare, e-commerce, Manufacturing
Development & QA Testing
Customer Finds It Post Release
Leads to Litigation, Fines, Reputation Loss
1x
100x
10,000xFind the issues
earlyor the costs are too High
Finding Issues in QA Stage is the Best, but QA is…
Not Agile With waterfall approch its too late...
Not Automated Manual data checks Wasted Time No repeatability or consistency No way to test millions of rows Wrong focus on creating scripts rather than the business
problem Cannot reconcile data across systems (e.g. Files vs. Database)
Not Collobrative QA teams work in isolation No feedabck to developers or business users Disoragnized
No Transparancy Late Discovery of Issues Project Failure or High Costs
Not Agile
Manual & Slow
Not Collaborative Or Feedback
No Transparency or Compliance
But Why is it so Difficult to Automate ETL Testing?
ETL Processes don’t have screens Conventional QA Automation product were designed for
screen based testing New Concepts
Source Data + Transformation = Target Data Quality of an ETL Process = Expected Data vs. Actual Data Most developers are from traditional software development New to concepts such as data reconciliation for ETL Testing Mix up of QA/QC concept with Data Quality
High Volume of Data (Millions of rows) Since the source data and target data could be in two
different systems reconciliation is difficult
Introducing iCEDQ…
Automated & Fast
Agile
Collaborative
Feedback & Transparency
An automation platform for:A.Data Warehouse
Testing• ETL Testing• MDM Testing• Data Integration Testing
B.Data Migration TestingC.Data Monitoring
iCEDQ has in-Memory Rules Engine
It Tests ETL Transformations by Validating the output data generated by ETL Validation Rule
Reconciliation Rule
It Tests ETL Transformations by Reconciling Source Data Vs. Target Data …
ETL
Data Warehouse/ETL Test AutomationData Sources ETL Data Warehouse
Tech Validation Test Biz Validation Test
Biz Reconciliation Test
Tech Reconciliation Test
Validation Tests Reconciliation Tests Technical Validation Rule Business Validate Rules Technical Reconciliation Rules Business Reconciliation Rules
Validate incoming data before processing. Test for… • Data format• Nulls• Data types• more
Business rules based validation will indicate if there is an data issue becuase of ETL processes, Source data or wrong requirements...• Check if Net Amount =? Gross
Amout – (Taxes + Fees + Commisions)
These rules test specific to an ETL process which is doing transformation…• An ETL processes calculating end
of day balances from daily transactions can be tested. Sum of todays transactions =? Today’s End of Day balance – Yesterday’s End of Day balance
These tests are designed to test the overall system independent of the ETL processes, Source data or business requirements
A
Data Migration Test AutomationLegacy System ETL New System
Initial Reconciliation Test
Post Reconciliation Test Initial Migration Testing Post Migration Testing
1st Create the data structures in the target system. Ex. Table, columns. 2nd copy the initial data from the legacy system to the new database• iCEDQ can validate the tables, columns, data types &
precision• Reconcile the legacy vs. target data to make sure they have
the same initial state
Once the initial state is populated & tested. The post migration phase involves. Feeding the same data or triggering of same business processes in legacy system and the new system.• iCEDQ can reconcile the data to make sure the after running
the business processes the data generated same• Because regardless of the system change, unless there is a
business rule change the net output from business point of view must be same
B
Production Data Monitoring AutomationSource Stage Stage Data
WarehouseData Warehouse Data
MartsData Marts Reports /
Extracts
ProcessLoad StageCustomer
ProcessLoad Stage
Policy
ProcessLoad Stage
Claims
Process Load Dim Customer
ProcessLoad Daily
Claims
ProcessLoad Month
Policy
ProcessLoad P&L
Process Load Dim Customer
Process Load Month
Claims
Start
Stop
Monitoring in Series Monitoring in ParallelEmbed iCEDQ Rules in the batch process• If Audit Fails the users are notified and the process can be
stopped automatically
The Audit Rules are run in parallel to the batch process• If Audit Fails the users are notified but the process is not
stopped automatically
iCEDQ
C
User
Sto
ry
Tech Requirements Mapping Document ETL Process
Audit Requirements Test Case
iCEDQ Rule 1
iCEDQ Rule 2
…
Test processes in parallel to the development pipeline
No reasons to wait!
iCEDQ is Agile
Development Pipeline
QA Pipeline
iCEDQ-Central Repository & Collaborative
Centralized Repository for Rules Library
An collaborative environment to work together
Work together regardless of the Location time Role
iCEDQ-Feedback & Transparency
Dashboard Fails & Custom Reports Integration with ALM & Issue
management Auto Notification Ability to drill down to an defect Audit Logs & execution histroy…
iCEDQ-What changed?
Before AfterNO Reconcile Across Files & database YES
Very Complicated SQL NO SQL or Simple SQLTest millions of rows
CostTest Coverage
NO Repeatability & Consistency YESNO Scheduling YES
Desktop Based Test Execution Server BasedNO Transparency & Reporting YES
Cost of DefectNO Regression Testing & Audit YESNO Production Monitoring YES
1000… Millions…100% 60%
High Low
HighLow
Who uses iCEDQ?
Stock Exchange Banks Insurance Manufacturing Healthcare E-Commerce Manufacturing …
iCEDQ Healthcare Client
iCEDQ Usage iCEDQ was used for Migration Testing
Test provider data migration from Mainframe to MDM
iCEDQ Enterprise Data Warehouse Testing Test Members Data, Enrolment data, Plans
Data, Claims Data load from Legacy to (Enterprise Data warehouse)EDW & Health Rules to EDW
iCEDQ to Validate External Feeds Test data feeds to State of Maryland, CMS
(Centers for Medicare & Medicaid Services)
iCEDQ Feedback Helped Finalize Requirements
It found anomalies in the requirements and mapping documents and provided feedback
Helped Test Automation It was able to automatically reconcile feeds
from legacy as well as new system. This was impossible to test manually
Transparency to management It linked with defect management system
and auto generated status
Fast forward Data Warehouse
& Migration Testing
ETL Testing & Monitoring Platform