etl testing and datawarehouse testing

Upload: vibhaskelk

Post on 23-Feb-2018

250 views

Category:

Documents


3 download

TRANSCRIPT

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    1/15

    ETL Testing or Datawarehouse Testing :

    Before we learn anything about ETL Testing its important to learn about Business

    Intelligence and Dataware. Lets get started

    What is BI?

    Business Intelligence is the process of collecting raw data or business data and turning

    it into information that is useful and more meaningful. The raw data is the records of

    the daily transaction of an organization such as interactions with customers

    administration of finance and management of employee and so on. These datas will be

    used for !"eporting #nalysis Data mining Data $uality and Interpretation %redicti&e

    #nalysis'.

    What is Data Warehouse?

    # data warehouse is a database that is designed for $uery and analysis rather than for

    transaction processing. The data warehouse is constructed by integrating the data from

    multiple heterogeneous sources.It enables the company or organization to consolidate

    data from se&eral sources and separates analysis wor(load from transaction wor(load.

    Data is turned into high $uality information to meet all enterprise reporting

    re$uirements for all le&els of users.

    What is ETL?

    ETL stands for E)tract*Transform*Load and it is a process of how data is loaded from

    the source system to the data warehouse. Data is e)tracted from an +LT% database

    transformed to match the data warehouse schema and loaded into the data warehouse

    http://www.guru99.com/sap-fico-training-tutorials.htmlhttp://www.guru99.com/sap-fico-training-tutorials.html
  • 7/24/2019 ETL Testing and Datawarehouse Testing

    2/15

    database. ,any data warehouses also incorporate data from non*+LT% systems such as

    te)t files legacy systems and spreadsheets.

    Let see how it wor(s

    -or e)ample there is a retail store which has different departments li(e sales

    mar(eting logistics etc. Each of them is handling the customer information

    independently and the way they store that data is $uite different. The sales

    department ha&e stored it by customers name while mar(eting department by

    customer id.

    ow if they want to chec( the history of the customer and want to (now what the

    different products he/she bought owing to different mar(eting campaigns0 it would be

    &ery tedious.

    The solution is to use a Datawarehouse to store information from different sources in a

    uniform structure using ETL. ETL can transform dissimilar data sets into an unified

    structure.Later use BI tools to deri&e meaningful insights and reports from this data.

    The following diagram gi&es you the "+#D ,#% of the ETL process

    1. Extract

    E)tract rele&ant data

    2. Transform

    Transform data to D3 4Data 3arehouse5 format

    Build (eys * # (ey is one or more data attributes that uni$uelyidentify an entity. 6arious types of (eys are primary (ey alternate

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    3/15

    (ey foreign (ey composite (ey surrogate (ey. The datawarehouse

    owns these (eys and ne&er allows any other entity to assign them.

    7leansing of data 8#fter the data is e)tracted it will mo&e into the

    ne)t phase of cleaning and conforming of data. 7leaning does the

    omission in the data as well as identifying and fi)ing the errors.

    7onforming means resol&ing the conflicts between those datas that

    is incompatible so that they can be used in an enterprise data

    warehouse. In addition to these this system creates meta*data that

    is used to diagnose source system problems and impro&es data

    $uality.

    9. Load

    Load data into D3 4 Data 3arehouse5

    Build aggregates * 7reating an aggregate is summarizing and storing

    data which is a&ailable in fact table in order to impro&e the

    performance of end*user $ueries.

    ETL Testing Process

    :imilar to other Testing %rocess ETL also go through different phases. The different

    phases of ETL testing process is as follows

    ETL testing is performed in fi&e stages

    1. Identifying data sources and re$uirements

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    4/15

    2. Data ac$uisition

    9. Implement business logics and dimensional ,odelling

    ;. Build and populate data

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    5/15

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    6/15

    Types of ETL Testing

    Types Of Testing Testing Process

    %roduction 6alidation Testing

    !Table balancing' or !production reconciliation' this

    type of ETL testing is done on data as it is being

    mo&ed into production systems. To support your

    business decision the data in your production

    systems has to be in the correct order. Informatica

    Data 6alidation +ption pro&ides the ETL testing

    automation and management capabilities to ensurethat production systems are not compromised by

    the data.

    :ource to Target Testing

    46alidation Testing5

    :uch type of testing is carried out to &alidate

    whether the data &alues transformed are the

    e)pected data &alues.

    #pplication =pgrades

    :uch type of ETL testing can be automatically

    generated sa&ing substantial test de&elopment

    time. This type of testing chec(s whether the data

    e)tracted from an older application or repository

    are e)actly same as the data in a repository or new

    application.

    ,etadata Testing

    ,etadata testing includes testing of data type

    chec( data length chec( and inde)/constraint

    chec(.

    Data 7ompleteness Testing

    To &erify that all the e)pected data is loaded in

    target from the source data completeness testing

    is done. :ome of the tests that can be run are

    compare and &alidate counts aggregates and

    actual data between the source and target for

    columns with simple transformation or no

    transformation.

    Data #ccuracy Testing This testing is done to ensure that the data is

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    7/15

    accurately loaded and transformed as e)pected.

    Data Transformation Testing

    Testing data transformation is done as in many

    cases it cannot be achie&ed by writing one source

    :>L $uery and comparing the output with the

    target. ,ultiple :>L $ueries may need to be run for

    each row to &erify the transformation rules.

    Data >uality Testing

    Data >uality Tests includes synta) and reference

    tests. In order to a&oid any error due to date or

    order number during business process Data >uality

    testing is done.

    :ynta) Tests8 It will report dirty data based on

    in&alid characters character pattern incorrect

    upper or lower case order etc.

    "eference Tests8 It will chec( the data according to

    the data model. -or e)ample8 7ustomer ID

    Data $uality testing includes number chec( date

    chec( precision chec( data chec( null chec( etc.

    Incremental ETL testing

    This testing is done to chec( the data integrity of

    old and new data with the addition of new data.

    Incremental testing &erifies that the inserts and

    updates are getting processed as e)pected during

    incremental ETL process.

    ?=I/a&igation TestingThis testing is done to chec( the na&igation or ?=Iaspects of the front end reports.

    How to create ETL Test Case

    ETL testing is a concept which can be applied to different tools and databases in

    information management industry. The objective of ETL testing is to assure that the

    data that has been loaded from a source to destination after business

    transformation is accurateIt also in&ol&es the &erification of data at &arious middle

    stages that are being used between source and destination.

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    8/15

    3hile performing ETL testing two documents that will always be used by an ETL tester

    are

    1. ETL mapping sheets :#n ETL mapping sheets contain all the information

    of source and destination tables including each and e&ery column and

    their loo(*up in reference tables. #n ETL testers need to be comfortable

    with :>L $ueries as ETL testing may in&ol&e writing big $ueries with

    multiple @oins to &alidate data at any stage of ETL. ETL mapping sheets

    pro&ide a significant help while writing $ueries for data &erification.

    2. D! "chema of "ource# Target: It should be (ept handy to &erify any

    detail in mapping sheets.

    ETL Test Scenarios and Test Cases

    Test "cenario Test $ases

    ,apping doc &alidation

    6erify mapping doc whether corresponding ETL

    information is pro&ided or not. 7hange log should

    maintain in e&ery mapping doc.

    6alidation

    1. 6alidate the source and target table structure

    against corresponding mapping doc.

    2. :ource data type and target data type should

    be same

    9. Length of data types in both source and

    target should be e$ual

    ;. 6erify that data field types and formats are

    specified

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    9/15

    as e)pected

    Data consistency issues

    1. The data type and length for a particular

    attribute may &ary in files or tables though the

    semantic definition is the same.

    2. ,isuse of integrity constraints

    7ompleteness Issues

    1. Ensure that all e)pected data is loaded into

    target table.

    2. 7ompare record counts between source and

    target.

    9. 7hec( for any re@ected records

    ;. 7hec( data should not be truncated in the

    column of target tables

    uality1. umber chec(8 eed to number chec( and

    &alidate it

    2. Date 7hec(8 They ha&e to follow date format

    and it should be same across all records

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    10/15

    9. %recision 7hec(

    ;. Data chec(

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    11/15

    and target table minus a $uery in a best solution

    2. 3e need to source minus target and target

    minus source

    9. If minus $uery returns any &alue those

    should be considered as mismatching rows

    ;. eeds to matching rows among source and

    target using intersect statement

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    12/15

    Type of !ugs Description

    =ser interface

    bugs/cosmetic bugs

    "elated to ?=I of application

    -ont style font size colors alignment spelling

    mista(es na&igation and so on

    Boundary 6alue

    #nalysis 4B6#5 related

    bug ,inimum and ma)imum &alues

    E$ui&alence 7lass

    %artitioning 4E7%5

    related bug 6alid and in&alid type

    Input/+utput bugs 6alid &alues not accepted

    In&alid &alues accepted

    7alculation bugs ,athematical errors

    -inal output is wrongLoad 7ondition bugs Does not allows multiple users

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    13/15

    Does not allows customer e)pected load

    "ace 7ondition bugs :ystem crash C hang

    :ystem cannot run client platforms

    6ersion control bugs o logo matching o &ersion information a&ailable

    This occurs usually in regression testing

    /3 bugs De&ice is not responding to the application

    elp :ource bugs ,ista(es in help documents

    Dierence etween Dataase testing and ETL

    testing

    ETL Testing Data !ase Testing

    6erifies whether data is mo&ed as e)pected

    The primary goal is to chec( if the data is

    following the rules/ standards defined in

    the Data ,odel

    6erifies whether counts in the source and

    target are matching

    6erifies whether the data transformed is as

    per e)pectation

    6erify that there are no orphan records

    and foreign*primary (ey relations are

    maintained

    6erifies that the foreign primary (ey

    relations are preser&ed during the ETL

    6erifies that there are no redundant tables

    and database is optimally normalized

    6erifies for duplication in loaded data6erify if data is missing in columns where

    re$uired

    !esponsii"ities of an ETL tester

    ey responsibilities of an ETL tester are segregated into three categories

    :tage table/ :-: or ,-:

    Business transformation logic applied Target table loading from stage file or table after applying atransformation.

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    14/15

    :ome of the responsibilities of an ETL tester are

    Test ETL software

    Test components of ETL datawarehouse

    E)ecute bac(end data*dri&en test 7reate design and e)ecute test cases test plans and test harness

    Identify the problem and pro&ide solutions for potential issues

    #ppro&e re$uirements and design specifications

    Data transfers and Test flat file

    3riting :>L $ueries9 for &arious scenarios li(e count test

    ETL Perfor#ance Testing and Tuning

    ETL performance testingis a confirmation test to ensure that an ETL system can handle

    the load of multiple users and transactions. The goal of performance tuning is to

    optimize session performance by eliminating performance bottlenec(s. To tune or

    impro&e the performance of the session you ha&e to identify performance bottlenec(s

    and eliminate it. %erformance bottlenec(s can be found in source and target databases

    the mapping the session and the system. +ne of the best tools used for performance

    testing is Informatica.

    $uto#ation of ETL TestingThe general methodology of ETL testing is to use :>L scripting or do !eyeballing' of

    data.. These approaches to ETL testing are time*consuming error*prone and seldom

    pro&ide complete test co&erage. To accelerate impro&e co&erage reduce costs impro&e

    defect detection ration of ETL testing in production and de&elopment en&ironments

    automation is the need of the hour. +ne such tool is Informatica.

    Best Practices for ETL Testing1. ,a(e sure data is transformed correctly

    2. 3ithout any data loss and truncation pro@ected data should be loaded into the

    data warehouse

    9. Ensure that ETL application appropriately re@ects and replaces with default

    &alues and reports in&alid data

    ;. eed to ensure that the data loaded in data warehouse within prescribed and

    e)pected time frames to confirm scalability and performance

  • 7/24/2019 ETL Testing and Datawarehouse Testing

    15/15