etl testing and datawarehouse testing
TRANSCRIPT
-
7/24/2019 ETL Testing and Datawarehouse Testing
1/15
ETL Testing or Datawarehouse Testing :
Before we learn anything about ETL Testing its important to learn about Business
Intelligence and Dataware. Lets get started
What is BI?
Business Intelligence is the process of collecting raw data or business data and turning
it into information that is useful and more meaningful. The raw data is the records of
the daily transaction of an organization such as interactions with customers
administration of finance and management of employee and so on. These datas will be
used for !"eporting #nalysis Data mining Data $uality and Interpretation %redicti&e
#nalysis'.
What is Data Warehouse?
# data warehouse is a database that is designed for $uery and analysis rather than for
transaction processing. The data warehouse is constructed by integrating the data from
multiple heterogeneous sources.It enables the company or organization to consolidate
data from se&eral sources and separates analysis wor(load from transaction wor(load.
Data is turned into high $uality information to meet all enterprise reporting
re$uirements for all le&els of users.
What is ETL?
ETL stands for E)tract*Transform*Load and it is a process of how data is loaded from
the source system to the data warehouse. Data is e)tracted from an +LT% database
transformed to match the data warehouse schema and loaded into the data warehouse
http://www.guru99.com/sap-fico-training-tutorials.htmlhttp://www.guru99.com/sap-fico-training-tutorials.html -
7/24/2019 ETL Testing and Datawarehouse Testing
2/15
database. ,any data warehouses also incorporate data from non*+LT% systems such as
te)t files legacy systems and spreadsheets.
Let see how it wor(s
-or e)ample there is a retail store which has different departments li(e sales
mar(eting logistics etc. Each of them is handling the customer information
independently and the way they store that data is $uite different. The sales
department ha&e stored it by customers name while mar(eting department by
customer id.
ow if they want to chec( the history of the customer and want to (now what the
different products he/she bought owing to different mar(eting campaigns0 it would be
&ery tedious.
The solution is to use a Datawarehouse to store information from different sources in a
uniform structure using ETL. ETL can transform dissimilar data sets into an unified
structure.Later use BI tools to deri&e meaningful insights and reports from this data.
The following diagram gi&es you the "+#D ,#% of the ETL process
1. Extract
E)tract rele&ant data
2. Transform
Transform data to D3 4Data 3arehouse5 format
Build (eys * # (ey is one or more data attributes that uni$uelyidentify an entity. 6arious types of (eys are primary (ey alternate
-
7/24/2019 ETL Testing and Datawarehouse Testing
3/15
(ey foreign (ey composite (ey surrogate (ey. The datawarehouse
owns these (eys and ne&er allows any other entity to assign them.
7leansing of data 8#fter the data is e)tracted it will mo&e into the
ne)t phase of cleaning and conforming of data. 7leaning does the
omission in the data as well as identifying and fi)ing the errors.
7onforming means resol&ing the conflicts between those datas that
is incompatible so that they can be used in an enterprise data
warehouse. In addition to these this system creates meta*data that
is used to diagnose source system problems and impro&es data
$uality.
9. Load
Load data into D3 4 Data 3arehouse5
Build aggregates * 7reating an aggregate is summarizing and storing
data which is a&ailable in fact table in order to impro&e the
performance of end*user $ueries.
ETL Testing Process
:imilar to other Testing %rocess ETL also go through different phases. The different
phases of ETL testing process is as follows
ETL testing is performed in fi&e stages
1. Identifying data sources and re$uirements
-
7/24/2019 ETL Testing and Datawarehouse Testing
4/15
2. Data ac$uisition
9. Implement business logics and dimensional ,odelling
;. Build and populate data
-
7/24/2019 ETL Testing and Datawarehouse Testing
5/15
-
7/24/2019 ETL Testing and Datawarehouse Testing
6/15
Types of ETL Testing
Types Of Testing Testing Process
%roduction 6alidation Testing
!Table balancing' or !production reconciliation' this
type of ETL testing is done on data as it is being
mo&ed into production systems. To support your
business decision the data in your production
systems has to be in the correct order. Informatica
Data 6alidation +ption pro&ides the ETL testing
automation and management capabilities to ensurethat production systems are not compromised by
the data.
:ource to Target Testing
46alidation Testing5
:uch type of testing is carried out to &alidate
whether the data &alues transformed are the
e)pected data &alues.
#pplication =pgrades
:uch type of ETL testing can be automatically
generated sa&ing substantial test de&elopment
time. This type of testing chec(s whether the data
e)tracted from an older application or repository
are e)actly same as the data in a repository or new
application.
,etadata Testing
,etadata testing includes testing of data type
chec( data length chec( and inde)/constraint
chec(.
Data 7ompleteness Testing
To &erify that all the e)pected data is loaded in
target from the source data completeness testing
is done. :ome of the tests that can be run are
compare and &alidate counts aggregates and
actual data between the source and target for
columns with simple transformation or no
transformation.
Data #ccuracy Testing This testing is done to ensure that the data is
-
7/24/2019 ETL Testing and Datawarehouse Testing
7/15
accurately loaded and transformed as e)pected.
Data Transformation Testing
Testing data transformation is done as in many
cases it cannot be achie&ed by writing one source
:>L $uery and comparing the output with the
target. ,ultiple :>L $ueries may need to be run for
each row to &erify the transformation rules.
Data >uality Testing
Data >uality Tests includes synta) and reference
tests. In order to a&oid any error due to date or
order number during business process Data >uality
testing is done.
:ynta) Tests8 It will report dirty data based on
in&alid characters character pattern incorrect
upper or lower case order etc.
"eference Tests8 It will chec( the data according to
the data model. -or e)ample8 7ustomer ID
Data $uality testing includes number chec( date
chec( precision chec( data chec( null chec( etc.
Incremental ETL testing
This testing is done to chec( the data integrity of
old and new data with the addition of new data.
Incremental testing &erifies that the inserts and
updates are getting processed as e)pected during
incremental ETL process.
?=I/a&igation TestingThis testing is done to chec( the na&igation or ?=Iaspects of the front end reports.
How to create ETL Test Case
ETL testing is a concept which can be applied to different tools and databases in
information management industry. The objective of ETL testing is to assure that the
data that has been loaded from a source to destination after business
transformation is accurateIt also in&ol&es the &erification of data at &arious middle
stages that are being used between source and destination.
-
7/24/2019 ETL Testing and Datawarehouse Testing
8/15
3hile performing ETL testing two documents that will always be used by an ETL tester
are
1. ETL mapping sheets :#n ETL mapping sheets contain all the information
of source and destination tables including each and e&ery column and
their loo(*up in reference tables. #n ETL testers need to be comfortable
with :>L $ueries as ETL testing may in&ol&e writing big $ueries with
multiple @oins to &alidate data at any stage of ETL. ETL mapping sheets
pro&ide a significant help while writing $ueries for data &erification.
2. D! "chema of "ource# Target: It should be (ept handy to &erify any
detail in mapping sheets.
ETL Test Scenarios and Test Cases
Test "cenario Test $ases
,apping doc &alidation
6erify mapping doc whether corresponding ETL
information is pro&ided or not. 7hange log should
maintain in e&ery mapping doc.
6alidation
1. 6alidate the source and target table structure
against corresponding mapping doc.
2. :ource data type and target data type should
be same
9. Length of data types in both source and
target should be e$ual
;. 6erify that data field types and formats are
specified
-
7/24/2019 ETL Testing and Datawarehouse Testing
9/15
as e)pected
Data consistency issues
1. The data type and length for a particular
attribute may &ary in files or tables though the
semantic definition is the same.
2. ,isuse of integrity constraints
7ompleteness Issues
1. Ensure that all e)pected data is loaded into
target table.
2. 7ompare record counts between source and
target.
9. 7hec( for any re@ected records
;. 7hec( data should not be truncated in the
column of target tables
uality1. umber chec(8 eed to number chec( and
&alidate it
2. Date 7hec(8 They ha&e to follow date format
and it should be same across all records
-
7/24/2019 ETL Testing and Datawarehouse Testing
10/15
9. %recision 7hec(
;. Data chec(
-
7/24/2019 ETL Testing and Datawarehouse Testing
11/15
and target table minus a $uery in a best solution
2. 3e need to source minus target and target
minus source
9. If minus $uery returns any &alue those
should be considered as mismatching rows
;. eeds to matching rows among source and
target using intersect statement
-
7/24/2019 ETL Testing and Datawarehouse Testing
12/15
Type of !ugs Description
=ser interface
bugs/cosmetic bugs
"elated to ?=I of application
-ont style font size colors alignment spelling
mista(es na&igation and so on
Boundary 6alue
#nalysis 4B6#5 related
bug ,inimum and ma)imum &alues
E$ui&alence 7lass
%artitioning 4E7%5
related bug 6alid and in&alid type
Input/+utput bugs 6alid &alues not accepted
In&alid &alues accepted
7alculation bugs ,athematical errors
-inal output is wrongLoad 7ondition bugs Does not allows multiple users
-
7/24/2019 ETL Testing and Datawarehouse Testing
13/15
Does not allows customer e)pected load
"ace 7ondition bugs :ystem crash C hang
:ystem cannot run client platforms
6ersion control bugs o logo matching o &ersion information a&ailable
This occurs usually in regression testing
/3 bugs De&ice is not responding to the application
elp :ource bugs ,ista(es in help documents
Dierence etween Dataase testing and ETL
testing
ETL Testing Data !ase Testing
6erifies whether data is mo&ed as e)pected
The primary goal is to chec( if the data is
following the rules/ standards defined in
the Data ,odel
6erifies whether counts in the source and
target are matching
6erifies whether the data transformed is as
per e)pectation
6erify that there are no orphan records
and foreign*primary (ey relations are
maintained
6erifies that the foreign primary (ey
relations are preser&ed during the ETL
6erifies that there are no redundant tables
and database is optimally normalized
6erifies for duplication in loaded data6erify if data is missing in columns where
re$uired
!esponsii"ities of an ETL tester
ey responsibilities of an ETL tester are segregated into three categories
:tage table/ :-: or ,-:
Business transformation logic applied Target table loading from stage file or table after applying atransformation.
-
7/24/2019 ETL Testing and Datawarehouse Testing
14/15
:ome of the responsibilities of an ETL tester are
Test ETL software
Test components of ETL datawarehouse
E)ecute bac(end data*dri&en test 7reate design and e)ecute test cases test plans and test harness
Identify the problem and pro&ide solutions for potential issues
#ppro&e re$uirements and design specifications
Data transfers and Test flat file
3riting :>L $ueries9 for &arious scenarios li(e count test
ETL Perfor#ance Testing and Tuning
ETL performance testingis a confirmation test to ensure that an ETL system can handle
the load of multiple users and transactions. The goal of performance tuning is to
optimize session performance by eliminating performance bottlenec(s. To tune or
impro&e the performance of the session you ha&e to identify performance bottlenec(s
and eliminate it. %erformance bottlenec(s can be found in source and target databases
the mapping the session and the system. +ne of the best tools used for performance
testing is Informatica.
$uto#ation of ETL TestingThe general methodology of ETL testing is to use :>L scripting or do !eyeballing' of
data.. These approaches to ETL testing are time*consuming error*prone and seldom
pro&ide complete test co&erage. To accelerate impro&e co&erage reduce costs impro&e
defect detection ration of ETL testing in production and de&elopment en&ironments
automation is the need of the hour. +ne such tool is Informatica.
Best Practices for ETL Testing1. ,a(e sure data is transformed correctly
2. 3ithout any data loss and truncation pro@ected data should be loaded into the
data warehouse
9. Ensure that ETL application appropriately re@ects and replaces with default
&alues and reports in&alid data
;. eed to ensure that the data loaded in data warehouse within prescribed and
e)pected time frames to confirm scalability and performance
-
7/24/2019 ETL Testing and Datawarehouse Testing
15/15