etl testing training presentation

36
The process of updating the data warehouse. ETL

Upload: apurba-biswas

Post on 17-Aug-2015

447 views

Category:

Software


53 download

TRANSCRIPT

Page 1: ETL Testing Training Presentation

The process of updating the data warehouse.

ETL

Page 2: ETL Testing Training Presentation

Defining ETL

ETL stands for extract, transform, and load. In many organizations, the setup of the IT department happened long time back. So in most of the cases the way each department handles the data is different. For example, in a retail organization you may have different departments such as sales, marketing, logistics, etc. Each of them is handling the customer information but way they store that data could be quite different. The sales department store it by name where as marketing department has it in the number format. ETL can take all this data from different sources and transform it into a uniform presentation, such as for storing in

a database or data warehouse.

Page 3: ETL Testing Training Presentation

Another challenge of the old IT infrastructure is the different data format used by different departments. So it might happen for HRD department somebody is using SAP where as the sales department you have Oracle Apps. So to take a business critical decision it is difficult for the higher management to get data from different platforms and consolidate them. ETL can do that job easily. It will take data from different sources and transform it to a uniform formatand store it into DB tables. From the DB, you can generate the required reports.

Page 4: ETL Testing Training Presentation

Sample ETL Tools

Teradata Warehouse Builder from Teradata

DataStage from Ascential SoftwareSAS System from SAS InstitutePower Mart/Power Center from

InformaticaSagent Solution from Sagent SoftwareHummingbird Genio Suite from

Hummingbird Communications

Page 5: ETL Testing Training Presentation

ETL Testing Tools

Query SurgeTalendSQL Server Integration Services (SSIS) 

Page 6: ETL Testing Training Presentation

Purpose of ETL Testing:

From the above points, clearly accuracy of the data after transformation is critical

in ETL testing because if the data is not accurate, then the business decision will be

wrong.

So following test types are commonly used in ETL testing:

No Data Losses: In this testing type, we first determine an N number of entities in the source system. Say the total number of employee is one such entity. If I have 2000 employees in the source system, then after ETL transformation and data store I should still get 2000 employees in the destination database. Only exception is when we have some business rule applied to the transformation. An example could be in the new system the business does not want to keep data for people who no longer works for the organization. So in that situation the destination system will discard the record of those employee and store number less than 2000.

Page 7: ETL Testing Training Presentation

This testing is done by mainly querying the source and designation database. Depending on the complexity of the system these SQL queries might be very complex and beyond the skill of a tester. However, running the query and checking the result is still the responsibility of the QA team.

Validation of Transformation rules: This is the second big validation point. In ETL testing, you will get the requirements in terms of transformation rules. One example could be the data format in the source was yyyy/dd/mm where as in the destination database this is mm/dd/yy. So you need to check them by taking some sample data. Here you apply all the good conventional test theories like boundary value analysis, equivalent partition, etc. All the database field level and record level data integrity must be tested here. So you need to select the test data in such a way that at the end all the fields are touched into the destination database after transformation.Business processes testing: Once the transformation is done these data feeds will go to many consumer systems (sales, marketing, HRD deferment applications). You need to ensure by testing from those systems that they can receive the data in the preformed format. All the critical end to end business flows must be tested from the GUI.

Purpose of ETL Testing :

Page 8: ETL Testing Training Presentation

Data Warehousing

Data Warehousing?A data warehouse can be considered as a

storage area where relevant data is stored irrespective of the source.

Data warehousing merges data from multiple sources into an easy and complete form.

Virtual Data Warehousing?A virtual data warehouse provides a

collective view of the completed data. I t can be considered as a logical data model of the containing metadataActive data warehousing?

An active data warehouse represents a single state of the business. It considers the analytic perspectives of customers and suppliers. It helps to deliver the updated data through reports.

Page 9: ETL Testing Training Presentation

Data Warehousing Cont…

Snapshot in a data warehouse:Snapshot refers to a complete visualization of data at

the time of extraction. It occupies less space and can be used to back up and restore data quickly.ETL process in data warehousing:

ETL stands for Extraction, transformation and loading.Extracting data from different sources such as flat files,

databases or XML data, transforming this data depending on the application’s needs and load this data into a data warehouse.

Difference between data mining and data warehousing:Data mining is a method for comparing large amounts of data

for the purpose of finding patterns. It is normally used for models and forecasting.

Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility.

Page 10: ETL Testing Training Presentation

Data Modelling and Data Mining

Data Modeling is a technique used to define and analyze the requirements of data that supports organization’s business process. In simple terms, it is used for the analysis of data objects in order to identify the relationships among these data objects in any business.

Data Mining is a technique used to analyze datasets to derive useful insights/information. It is mainly used in retail, consumer goods, telecommunication and financial organizations that have a strong consumer orientation in order to determine the impact on sales, customer satisfaction and profitability.

Page 11: ETL Testing Training Presentation

Data Warehousing Vs Business Intelligence

Data warehousing relates to all aspects of data management starting from the development, implementation and operation of the data sets. It is a back up of all data relevant to business.(  data store).

Business Intelligence is used to analyze the data from the point of business to measure any organization’s success.

The factors like sales, profitability, marketing campaign effectiveness, market shares and operational efficiency etc are analyzed using Business Intelligence tools like Cognos, Informatica etc.

Page 12: ETL Testing Training Presentation

OLTP Vs OLAP

OLTP = OnLine Transaction Processing.Applications that supports and manages

transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports

transactions across networks.

OLAP = OnLine Analytical Processing.Business data analysis and complex

calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.

Page 13: ETL Testing Training Presentation

Types of ETL Testing    1)      Constraint Testing:

In the phase of constraint testing, the test engineers identifies whether the data is mapped from source to target or not.

The Test Engineer follows the below scenarios in ETL Testing process.

a)      NOT NULL

b)      UNIQUE

c)       Primary Key

d)      Foreign key

e)      Check

f)       Default

g)      NULL

2)      Source to Target Count Testing:

In the Source to Target data is matched or not. A Tester can check in this view whether it is ascending order or descending order it doesn’t matter .Only count is required for Tester.

Due to lack of time a tester can follow this type of Testing.

3)      Source to Target Data Validation Testing:

In this Testing, a tester can validate the each and every point of the source to target data.

Most of the financial projects, a tester can identify the decimal factors.

Page 14: ETL Testing Training Presentation

Types of ETL Testing Cont..

  4)      Threshold/Data Integrated Testing:

In this Testing, the Ranges of the data, A test Engineer can usually identifies the population calculation and share marketing and business finance analysis

(quarterly, halferly, Yearly)  

MIN MAX RANGE

4 10 6

5)      Field to Field Testing:

In the field to field testing, a test engineer can identify that how much space is occupied in the database. The data is integrated in the table cum data types.

  NOTE: To check the order of the columns and source column to target column.

Page 15: ETL Testing Training Presentation

Types of ETL Testing Cont.. 6)      Duplicate Check Testing:

In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time the tester follows database queries why because huge amount of data is present in source and Target tables.

Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL HAVING COUNT (*) >1;

Note:

1)  There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may arise.

2)  Sometimes, a developer can do mistakes while transferring the data from source to target at that time duplicates may arise.

3)  Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).

7)      Error/Exception Logical Testing:1)      Delimiter is available in Valid Tables2)      Delimiter is not available in invalid tables(Exception Tables)

  8)      Incremental and Historical Process Testing:

In the Incremental data, the historical data is not corrupted. When the historical data is corrupted then this is the condition where bugs raise.

Page 16: ETL Testing Training Presentation

Types of ETL Testing Cont..

   9)      Control Columns and Defect Values Testing:

This is introduced by IBM

   10)   Navigation Testing:

Navigation Testing is the End user point of view testing. An end user cannot follow the friendly of the application that navigation is called as bad or poor Navigation.

At the time of Testing, A tester can identify this type of navigation scenarios to avoid unnecessary navigation.

11)   Initialization testing:

A combination of hardware and software installed in platform is called the Initialization Testing

12)   Transformation Testing:

At the time of mapping from source table to target table, Transformation is not in mapping condition, then the Test Engineer raises bugs.

Page 17: ETL Testing Training Presentation

Types of ETL Testing Cont..

 13)   Regression Testing:Code modification to fix a bug or to implement a new

functionality which makes us to to find errors. These introduced errors are called regression. Identifying for regression effect is called regression testing.14)   Retesting:

Re executing the failed test cases after fixing the bug.   15)   System Integration Testing:

Integration testing: After the completion of programming process. Developer can integrate the modules there are 3 models

a)  Top Downb)  Bottom Upc)  Hybrid

Page 18: ETL Testing Training Presentation

Test Scenarios for ETL Testing

    1. To  validate  the  data in table (emp)    2. To validate the table structure.    3. To validate the null values  of the table.    4. To validate the null values of very attribute.    5. To check the duplicate values of  the table.    6. To check the duplicate values of each attribute of the table    7. To check the field value or space (length of the field size)       8. To check the constraints (foreign ,primary key)    9. To check the name of the employer who has not earned any

commission    10. To check the all employers who are work in dept no (Account

dept,sales dept)    11. To check the row count of each attribute.     12. To check the row count of the table.    13. To check the max salary from emp table.    14. To check the min salary from emp table.

Page 19: ETL Testing Training Presentation

Types of ETL Bugs

1. User interface bugs/cosmetic bugs:- Related to GUI of application Navigation, spelling mistakes, font style, font size, colors,

alignment.

2. BVA Related bug:- Minimum and maximum values

3. EP Related bug:- Valid and invalid type

4. Input/output bugs:- Valid values not accepted Invalid values accepted 

5. Calculation bugs:- Mathematical errors Final output is wrong 

 

Page 20: ETL Testing Training Presentation

Types of ETL Bugs Cont…

6. Load condition bugs:- Does not allows multiple users Does not allows customer expected load

  7. Race condition bugs:- System crash & hang System cannot run client plat forms

  8. Version control bugs:- No logo matching No version information available This occurs usually in regression testing

9. H/W bugs:- Device is not responding to the application 

  10. Source bugs:- Mistakes in help documents

Page 21: ETL Testing Training Presentation

SQL Queries for ETL Testing

Page 22: ETL Testing Training Presentation

1.Create a table:

CREATE TABLE customer First_Name char(50),Last_Name char(50),Address char(50), City char(50),Country char(25),Birth_Date date);

2. Add a column to a table:

ALTER TABLE customer ADD SO_INSURANCE_PROVIDER Varchar2(35);

3. DROP a column to a table

ALTER TABLE customer DROP column SO_INSURANCE_PROVIDER Varchar2(35);

4. Add a default value to a column

ALTER TABLE customer MODIFY SO_INSURANCE_PROVIDER Varchar2(35) DEFAULT 'ABC Ins';

5. Renaming a table:

ALTER TABLE suppliers RENAME TO vendors;

Page 23: ETL Testing Training Presentation

6. Modifying column(s) in a table:ALTER TABLE supplier MODIFY supplier_namevarchar2(100) not null;

7. Drop column(s) in a table:ALTER TABLE supplier DROP COLUMN supplier_name;

8. Primary key:CREATE TABLE supplier( supplier_id numeric(10) not null,supplier_namevarchar2(50) not null,contact_namevarchar2(50),CONSTRAINT supplier_pk PRIMARY KEY (supplier_id,supplier_name));

9.Add primary key: ALTER TABLE supplier add CONSTRAINT supplier_pk PRIMARY KEY (supplier_id);

10.Drop primary key: ALTER TABLE supplier drop CONSTRAINT supplier_pk;

11.Disable primary key: ALTER TABLE supplier disable CONSTRAINT supplier_pk;

12.Enable primary key: ALTER TABLE supplier enable CONSTRAINT supplier_pk;

Page 24: ETL Testing Training Presentation

13.Foreign key creation:

CREATE TABLE supplier(supplier_id numeric(10) not null,supplier_namevarchar2(50) not null,contact_namevarchar2(50),CONSTRAINT supplier_pk PRIMARY KEY (supplier_id, supplier_name));CREATE TABLE products(Product_id numeric (10) not null,Supplier_id numeric (10) not null,supplier_name varchar2 (50) not null,CONSTRAINT fk_supplier_compFOREIGN KEY (supplier_id, supplier_name)REFERENCES supplier (supplier_id, supplier_name));

Page 25: ETL Testing Training Presentation

14. Alter foreign key:

ALTER TABLE products add CONSTRAINT fk_supplier FOREIGN KEY

(supplier_id) REFERENCES supplier (supplier_id);

15. Drop foreign key:

ALTER TABLE SALES_ORDER_LINE DROP FOREIGN KEY

FK_SALES_ORDER_LINE_PRODUCT

16. Check constraint:

ALTER TABLE EMPLOYEE ADD CONSTRAINT REVENUE CHECK

(SALARY + COMM > 25000)

17. Drop check constraint:

ALTER TABLE EMPLOYEE DROP CONSTRAINT REVENUE CHECK

(SALARY + COMM > 25000)

18. Drop Table:

DROP TABLE customer;

Page 26: ETL Testing Training Presentation

19. Truncate Statement:

Truncate table customer;

*********************************************************************End of DDL

Statements***********************************************************************

****************************************************************** DML Statements

***************************************************************************

20. Insert rows in table:

1) INSERT INTO Store_Information (store_name, Sales, Date)

VALUES ('Los Angeles', 900, 'Jan-10-1999')

2) INSERT INTO Store_Information (store_name, Sales, Date) SELECT

store_name, Sales, Date FROM Sales_Info WHERE Year

(Date) = 1998

Page 27: ETL Testing Training Presentation

21. Update Statement in table:

UPDATE suppliers SET name = 'HP' WHERE name = 'IBM';

UPDATE suppliers

SET supplier_name =

(SELECT customers.name

FROM customers

WHERE customers. customer_id = suppliers. supplier_id);

22. Delete Statement in table:

DELETE FROM suppliers WHERE supplier_name = 'IBM';

DELETE FROM suppliers

WHERE EXISTS

( select customers.name

from customers

wherecustomers.customer_id = suppliers.supplier_id

andcustomers.customer_name = 'IBM' );

************************** select statement

Page 28: ETL Testing Training Presentation

23.Select Statement in table:

1. SELECT LastName, FirstName FROM Persons;

2. SELECT * FROM Persons;

The SELECT DISTINCT Statement:

SELECT DISTINCT Company FROM Orders;

The WHERE Clause:

SELECT * FROM Persons WHERE City='Sandnes‘

Using LIKE

SELECT * FROM Persons WHERE FirstName LIKE 'O%'

Page 29: ETL Testing Training Presentation

Arithmetic Operation:

Operator Description

= Equal

<> Not equal

> Greater than

< Less than

>= Greater than or equal

<= Less than or equal

BETWEEN Between an inclusive range

LIKE Search for a pattern

IN If you know the exact value you

want to return for at least one of the

columns

Page 30: ETL Testing Training Presentation

BETWEEN ... AND:SELECT * FROM Persons WHERE LastName BETWEEN 'Hansen' AND

'Pettersen';INSELECT * FROM Persons WHERE LastName IN 'Hansen','Pettersen');Column Name AliasSELECT LastName AS Family, FirstName AS Name FROM PersonsAND & ORSELECT * FROM Persons WHERE FirstName='Tove' ANDLastName='Svendson'SELECT * FROM Persons WHERE firstname='Tove' ORlastname='Svendson'ORDER BYSELECT Company, OrderNumber FROM Orders ORDER BY CompanySELECT Company, OrderNumber FROM Orders ORDER BY CompanyDESC, OrderNumber ASC

Page 31: ETL Testing Training Presentation

Group by Clause:

SELECT Company, SUM (Amount) FROM Sales GROUP BY Company;

Having Clause:

SELECT Company, SUM (Amount) FROM Sales GROUP BY Company HAVING SUM (Amount)>10000;

Using UNION Clause:

SELECT E_Name FROM Employees_Norway UNION SELECT E_Name FROM Employees_USA

UNION ALL Clause:

SELECT E_Name FROM Employees_Norway

UNION ALL

SELECT E_Name FROM Employees_USA

Page 32: ETL Testing Training Presentation

JOINS:

Referring to Two Tables:

SELECT Employees.Name, Orders.Product FROM Employees, Orders WHERE Employees.Employee_ID=Orders.Employee_ID

INNER JOIN:

SELECT Employees.Name, Orders.Product FROM Employees INNER JOIN Orders ON Employees.Employee_ID=Orders.Employee_ID;

LEFT JOIN:

SELECT Employees.Name, Orders.Product FROM Employees LEFT JOIN Orders

ON Employees.Employee_ID=Orders.Employee_ID;

RIGHT JOIN:

SELECT Employees.Name, Orders.Product FROM Employees RIGHT JOIN Orders

ON Employees.Employee_ID=Orders.Employee_ID;

Page 33: ETL Testing Training Presentation

Subqueries:

1) Select distinct country from Northwind.dbo.Customers where country not in (select distinct country from Northwind.dbo.Suppliers);

2) Select top 1 OrderId, convert (char (10), OrderDate, 121) Last_Paris_Order,

(Select convert (char (10), max (OrderDate), 121) from Northwind.dbo.Orders) Last_OrderDate,datediff(dd,OrderDate, (select Max(OrderDate) from Northwind.dbo.Orders))Day_Diff fromNorthwind.dbo.Orders whereShipCity = 'Paris' order by OrderDatedesc;

Page 34: ETL Testing Training Presentation

Commit & Rollback Statements:

1) UPDATE suppliers SET name = 'HP' WHERE name = 'IBM';

Commit;

2) UPDATE suppliers SET name = 'HP' WHERE name = 'IBM';

Rollback;

SavepointStatement:

INSERT INTO DEPARTMENT VALUES ('A20', 'MARKETING', 301);

SAVEPOINT SAVEPOINT1;

INSERT INTO DEPARTMENT VALUES ('B30', 'FINANCE', 520);

SAVEPOINT SAVEPOINT2;

Page 35: ETL Testing Training Presentation

ETL Tool QuerySurge at a Glance

Page 36: ETL Testing Training Presentation

Thank You!