database & etl testing

Upload: mallanna-rb

Post on 12-Oct-2015

149 views

Category:

Documents


1 download

DESCRIPTION

ETL

TRANSCRIPT

Database (Data) testing: The comprehensive guide The GUI is in most cases given the most emphasis by the respective test managers as well as the development team members since the Graphical User Interface happens to be the most visible part of the application. However what is also important is to validate the information which can be considered as the heart of the application aka DATABASE.Let us consider a Banking application whereby a user makes transactions. Now from database testing viewpoint following things are important:

1. The application stores the transaction information in the application database and displays them correctly to the user.2. No information is lost in the process.3. No partially performed or aborted operation information is saved by the application.4. No unauthorized individual is allowed to access the users information.To ensure all these above objectives we need to use data validation or data testing.Fundamental differences between user-interface and data testing

User-Interface testingDatabase or Data testing

This type of testing is also known as Graphical user Interface testing or Front-end Testing.This type of testing is also known as Back-end Testing or data testing.

This type of testing chiefly deals with all the testable items which are open to the user for viewership and interaction like Forms, Presentation, Graphs, Menus, and Reports etc. (created through VB, VB.net, VC++, Delphi - Frontend tools )This type of testing chiefly deals with all the testable items which are generally hidden from the user for viewership. These include internal process and storage like Assembly, DBMS like Oracle, SQL Server, MYSQL etc.

This type of testing include validating thetext boxes,select dropdowns,calendars and buttons,navigation from one page to another, display of images as well as Look and feel of the overall application.This type of testing involve validatingthe schema , database tables , columns ,keys and indexes , stored procedures , triggers , database server validations , validating data duplication ,

The tester must be thoroughly knowledgeable about the business requirements as well as the usage of the development tools and the usage of automation framework and tools.The tester in order to be able to perform back end testing must have strong background in database server and Structured Query Language concepts.

Types of database testing

The 3 types of Database Testing are1. Structural Testing2. Functional Testing3. Non-functional TestingLets look into each type and its sub-types one by one.Structural database testingThe structural data testing actually involves the validation of all those elements inside the data repository which are used primarily for storage of data and which are not allowed to be directly manipulated by the end users. The validation of the database servers is also a very important consideration in these types of testing. The successful completion of this phase by the testers involves mastery in SQL queries.Schema testingThe chief aspect of schema testing is to ensure that the schema mapping between the front end and back end are similar. Thus we may also refer to schema testing as mapping testing.Let us discuss most important checkpoints for schema testing.1. Validation of the various schema formats associated with the databases. Many times the mapping format of the table may not be compatible with the mapping format present in the user interface level of the application.2. There is need for verification in the case unmapped tables/views/columns.3. There is also a need to verify whether heterogeneous databases in an environment are consistent with the overall application mapping.Let us also look at some of the interesting tools for validating database schemas. DBUnit which is integrated with Ant is a very suitable for mapping testing. SQL Server allows the testers to be able to check and to query the schema of the database by writing simple queries and not through code.For example, if the developers want to change a table structure or delete it, the tester would want to ensure that all the Stored Procedures and Views that use that table are compatible with the particular change. Another example could be that, if the testers want to check for schema changes between 2 databases, they can do that by using simple queries.Database table, column testingLet us look into various checks for database and column testing.1. Whether the mapping of the database fields and columns in the back end is compatible with those mappings in the front end.2. Validation of the length and naming convention of the database fields and columns as specified by the requirements.3. Validation of the presence of any unused/unmapped database tables/columns.4. Validation of the compatibility of the data type field lengths of the backend database columns with that of those present in the front end of the application.5. Whether the database fields allow the user to provide desired user inputs as required by the business requirement specification documents.Keys and indexes testingImportant checks for keys and indexes -1. Check whether the required Primary key Foreign Key constraints have been created on the required tables.2. Check whether the references for foreign keys are valid.3. Check whether the data type of the primary key and the corresponding foreign keys are same in the two tables.4. Check whether the required naming conventions have been followed for all the keys and indexes.5. Check the size and length of the required fields and indexes.6. Whether the required Clustered indexes Non Clustered indexes have been created on the required tables as specified by the business requirements.Stored procedures testingThe list of the most important things which are to be validated for the stored procedures.1. Whether the development team did adopt the required coding standard conventions exception and error handling for all the stored procedures for all the modules for the application under test.2. Whether the development team did cover all the conditions/loops by applying the required input data to the application under test.3. Whether the development team did properly apply the TRIM operations whenever data is fetched from the required tables in the Database.4. Whether the manual execution of the Stored Procedure provides the end user with the required result5. Whether the manual execution of the Stored Procedure ensures the table fields are being updated as required by the application under test.6. Whether the execution of the Stored Procedures enables the implicit invoking of the required triggers.7. Validation of the presence of any unused stored procedures.8. Validation forAllow Null condition which can be done at the database level.9. Validation of the fact that all the Stored Procedures and Functions have been successfully executed when the Database under test is blank.10. Validation of the overall integration of the stored procedure modules as per as the requirements of the application under test.Some of the interesting tools for testing stored procedures are LINQ , SP Test tool etc.Trigger testing1. Whether the required coding conventions have been followed during the coding phase of the Triggers.2. Check whether the triggers executed for the respective DML transactions have fulfilled the required conditions.3. Whether the trigger updates the data correctly once they have been executed.4. Validation of the required Update/Insert/Delete triggers functionality in the realm of the application under test.Database server validations

1. Check the database server configurations as specified by the business requirements.2. Check the authorization of the required user to perform only those levels of actions which are required by the application.3. Check that the database server is able to cater to the needs of maximum allowed number of user transactions as specified by the business requirement specifications.Functional database testingThe Functional database testing as specified by the requirement specification needs to ensure most of those transactions and operations as performed by the end users are consistent with the requirement specifications.Following are the basic conditions which need to be observed for database validations. Whether the field is mandatory while allowing NULL values on that field. Whether the length of each field is of sufficient size? Whether all similar fields have same names across tables? Whether there are any computed fields present in the Database?This particular process is the validation of the field mappings from the end user viewpoint. In this particular scenario the tester would perform an operation at the data base level and then would navigate to the relevant user interface item to observe and validate whether the proper field validations have been carried out or not.The vice versa condition whereby first an operation is carried out by the tester at the user interface and then the same is validated from the back end is also considered to be a valid option.Checking data integrity and consistencyFollowing checks are important1. Whether the data is logically well organized2. Whether the data stored in the tables is correct and as per the business requirements.3. Whether there are any unnecessary data present in the application under test.4. Whether the data has been stored as per as the requirement with respect to data which has been updated from the user interface.5. Whether the TRIM operations performed on the data before inserting data into the database under test.6. Whether the transactions have been performed according to the business requirement specifications and whether the results are correct or not.7. Whether the data has been properly committed if the transaction has been successfully executed as per the business requirements.8. Whether the data has been roll backed successfully if the transaction has not been executed successfully by the end user.9. Whether the data has been roll backed at all in the condition that the transaction has not been executed successfully and multiple heterogeneous databases have been involved in the transaction in question.10. Whether all the transactions have been executed by using the required design procedures as specified by the system business requirements.Login and user securityThe validations of the login and user security credentials need to take into consideration the following things.1. Whether the application prevents the user to proceed further in the application in case of a invalid username but valid password valid username but invalid password. invalid username and invalid password. valid username and a valid password.2. Whether the user is allowed to perform only those specific operations which are specified by the business requirements.3. Whether the data secured from unauthorized access4. Whether there are different user roles created with different permissions5. Whether all the users have required levels of access on the specified Database as required by the business specifications.6. Check that sensitive data like passwords, credit card numbers are encrypted and not stored as plain text in database. It is a good practice to ensure all accounts should have passwords that are complex and not easily guessed.Non-functional testingNonfunctional testing in the context of database testing can be categorized into various categories as required by the business requirements. These can be load testing, stress testing, security testing, usability testing, and compatibility testing and so on. The load testing as well as stress testing which can be grouped under the gamut of performance testing serves two specific purposes when it comes to the role of nonfunctional testing.Risk quantification- Quantification of risk actually helps the stakeholders to ascertain the various system response time requirements under required levels of load. This is the original intent of any quality assurance task. We need to note that load testing does not mitigate risk directly, but through the processes of risk identification and of risk quantification, presents corrective opportunities and an impetus for remediation that will mitigate risk.Minimum system equipment requirement- The understanding which we observe through formal testing, the minimum system configuration that will allow the system to meet the formal stated performance expectations of stakeholders. So that extraneous hardware, software and the associated cost of ownership can be minimized. This particular requirement can be categorized as the overall business optimization requirement.Load testingThe purpose of any load test should be clearly understood and documented.The following types of configurations are a must for load testing.1. The most frequently used user transactions have the potential to impact the performance of all of the other transactions if they are not efficient.2. At least one non-editing user transaction should be included in the final test suite, so that performance of such transactions can be differentiated from other more complex transactions.3. The more important transactions that facilitate the core objectives of the system should be included, as failure under load of these transactions has, by definition, the greatest impact.4. At least one editable transaction should be included so that performance of such transactions can be differentiated from other transactions.5. The observation of the optimum response time under huge number of virtual users for all the prospective requirements.6. The observation of the effective times for fetching of various records.Important load testing tools are load runner, win runner and JMeter.Stress testingStress testing is also sometimes referred to as torturous testing as it stresses the application under test with enormous loads of work such that the system fails .This helps in identifying breakdown points of the system.Important stress testing tools are load runner, win runner and JMeter.Most common occurring issues during database testing1. Significant amount of overhead could be involved in order to determine the state of the database transactions.2. Solution: The overall process planning and timing should be organized so that no time and cost based issues appear.3. New test data have to be designed after cleaning up of the old test data.4. Solution: A prior plan and methodology for test data generation should be at hand.5. An SQL generator is required to transform SQL validators in order to ensure the SQL queries are apt for handling the required database test cases.6. Solution: Maintenance of the SQL queries and their continuous updating is a significant part of the overall testing process which should be part of the overall test strategy.7. The above mentioned prerequisite ensure that the set-up of the database testing procedure could be costly as well as time consuming.8. Solution: There should be a fine balance between quality and overall project schedule duration.

Myths or Misconceptions related to Database Testing.1. Database Testing requires plenty of expertise and it is a very tedious job Reality: Effective and efficient Database testing provides long-term functional stability to the overall application thus it is necessary to put in hard work behind it.2. Database testing adds extra work bottleneck Reality: On the contrary, database testing adds more value to the overall work by finding out hidden issues and thus pro-actively helping to improve the overall application.3. Database testing slows down the overall development process Reality: Significant amount of database testing helps in the overall improvement of quality for the database application.4. Database testing could be excessively costly Reality: Any expenditure on database testing is a long-term investment which leads to long-term stability and robustness of the application. Thus expenditure on database testing is necessary.Best Practices All data including the metadata as well as the functional data needs to be validated according to their mapping by the requirement specification documents. Verification of the test data which has been created by / in consultation with the development team needs to be validated. Validation of the output data by using both manual as well as automation procedures. Deployment of various techniques such as the cause effect graphing technique, equivalence partitioning technique and boundary-value analysis technique for generation of required test data conditions. The validation rules of referential integrity for the required database tables also need to be validated. The selection of default table values for validation on database consistency is a very important concept Whether the log events have been successfully added in the database for all required login events Does scheduled jobs execute in timely manner? Take timely backup of Database.

ETL/ Datawarehouse Testing : Utlimate Guide Before we learn ETL Testing, lets understand What is BI?Business Intelligence is the process of collecting raw data or business data and turning it into information that is useful and more meaningful. The raw data is the records of the daily transaction of an organization such as interactions with customers, administration of finance, and management of employee and so on. These datas will be used for Reporting, Analysis, Data mining, Data quality and Interpretation, Predictive Analysis.

What is Data Warehouse?A data warehouse is a database that is designed for query and analysis rather than for transaction processing. The data warehouse is constructed by integrating the data from multiple heterogeneous sources.It enables the company or organization to consolidate data from several sources and separates analysis workload from transaction workload. Data is turned into high quality information to meet all enterprise reporting requirements for all levels of users.

What is ETL?ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the source system to the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database. Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems and spreadsheets.Let see how it worksFor example, there is a retail store which has different departments like sales, marketing, logistics etc. Each of them is handling the customer information independently, and the way they store that data is quite different. The sales department have stored it by customers name, while marketing department by customer id.Now if they want to check the history of the customer and want to know what the different products he/she bought owing to different marketing campaigns; it would be very tedious.The solution is to use a Datawarehouse to store information from different sources in a uniform structure using ETL. ETL can transform dissimilar data sets into aunifiedstructure.Later use BI tools to derive meaningful insights and reports from this data.The following diagram gives you the ROAD MAP of the ETL process

1. Extract Extract relevant data2. Transform Transform data to DW (Data Warehouse) format Build keys - A key is one or more data attributes that uniquely identify an entity. Various types of keys are primary key, alternate key, foreign key, composite key, surrogate key. The datawarehouse owns these keys and never allows any other entity to assign them. Cleansing of data :After the data is extracted, it will move into the next phase, of cleaning and conforming of data. Cleaning does the omission in the data as well as identifying and fixing the errors. Conforming means resolving the conflicts between those datas that is incompatible, so that they can be used in an enterprise data warehouse. In addition to these, this system creates meta-data that is used to diagnose source system problems and improves data quality.3. Load Load data into DW ( Data Warehouse) Build aggregates - Creating an aggregate is summarizing and storing data which is available in fact table in order to improve the performance of end-user queries.ETL Testing ProcessSimilar to other Testing Process, ETL also go through different phases. The different phases of ETL testing process is as follows

ETL testing is performed in five stages1. Identifying data sources and requirements2. Data acquisition3. Implement business logics and dimensional Modelling4. Build and populate data5. Build Reports

Types of ETL Testing

Types Of TestingTesting Process

Production Validation TestingTable balancing or production reconciliation this type of ETL testing is done on data as it is being moved into production systems. To support your business decision, the data in your production systems has to be in the correct order. Informatica Data Validation Option provides the ETL testing automation and management capabilities to ensure that production systems are not compromised by the data.

Source to Target Testing (Validation Testing)Such type of testing is carried out to validate whether the data values transformed are the expected data values.

Application UpgradesSuch type of ETL testing can be automatically generated, saving substantial test development time. This type of testing checks whether the data extracted from an older application or repository are exactly same as the data in a repository or new application.

Metadata TestingMetadata testing includes testing of data type check, data length check and index/constraint check.

Data Completeness TestingTo verify that all the expected data is loaded in target from the source, data completeness testing is done. Some of the tests that can be run are compare and validate counts, aggregates and actual data between the source and target for columns with simple transformation or no transformation.

Data Accuracy TestingThis testing is done to ensure that the data is accurately loaded and transformed as expected.

Data Transformation TestingTesting data transformation is done as in many cases it cannot be achieved by writing one source SQL query and comparing the output with the target. Multiple SQL queries may need to be run for each row to verify the transformation rules.

Data Quality TestingData Quality Tests includes syntax and reference tests. In order to avoid any error due to date or order number during business process Data Quality testing is done.Syntax Tests: It will report dirty data, based on invalid characters, character pattern, incorrect upper or lower case order etc.Reference Tests: It will check the data according to the data model. For example: Customer IDData quality testing includes number check, date check, precision check, data check , null check etc.

Incremental ETL testingThis testing is done to check the data integrity of old and new data with the addition of new data. Incremental testing verifies that the inserts and updates are getting processed as expected during incremental ETL process.

GUI/Navigation TestingThis testing is done to check the navigation or GUI aspects of the front end reports.

How to create ETL Test CaseETL testing is a concept which can be applied to different tools and databases in information management industry. The objective of ETL testing is to assure that the data that has been loaded from a source to destination after business transformation is accurate. It also involves the verification of data at various middle stages that are being used between source and destination.While performing ETL testing, two documents that will always be used by an ETL tester are1. ETL mapping sheets :An ETL mapping sheets contain all the information of source and destination tables including each and every column and their look-up in reference tables. An ETL testers need to be comfortable with SQL queries as ETL testing may involve writing big queries with multiple joins to validate data at any stage of ETL. ETL mapping sheets provide a significant help while writing queries for data verification.2. DB Schema of Source, Target: It should be kept handy to verify any detail in mapping sheets.ETL Test Scenarios and Test CasesTest ScenarioTest Cases

Mapping doc validationVerify mapping doc whether corresponding ETL information is provided or not. Change log should maintain in every mapping doc.

Validation1. Validate the source and target table structure against corresponding mapping doc.2. Source data type and target data type should be same3. Length of data types in both source and target should be equal4. Verify that data field types and formats are specified 5. Source data type length should not less than the target data type length6. Validate the name of columns in the table against mapping doc.

Constraint ValidationEnsure the constraints are defined for specific table as expected

Data consistency issues1. The data type and length for a particular attribute may vary in files or tables though the semantic definition is the same.2. Misuse of integrity constraints

Completeness Issues1. Ensure that all expected data is loaded into target table.2. Compare record counts between source and target.3. Check for any rejected records4. Check data should not be truncated in the column of target tables5. Check boundary value analysis6. Compares unique values of key fields between data loaded to WH and source data

Correctness Issues1. Data that is misspelled or inaccurately recorded2. Null, non-unique or out of range data

TransformationTransformation

Data Quality1. Number check: Need to number check and validate it2. Date Check: They have to follow date format and it should be same across all records3. Precision Check4. Data check5. Null check

Null ValidateVerify the null values, where Not Null specified for a specific column.

Duplicate Check1. Needs to validate the unique key, primary key and any other column should be unique as per the business requirements are having any duplicate rows2. Check if any duplicate values exist in any column which is extracting from multiple columns in source and combining into one column3. As per the client requirements, needs to be ensure that no duplicates in combination of multiple columns within target only

Date ValidationDate values are using many areas in ETL development for1. To know the row creation date2. Identify active records as per the ETL development perspective3. Identify active records as per the business requirements perspective4. Sometimes based on the date values the updates and inserts are generated.

Complete Data Validation1. To validate the complete data set in source and target table minus a query in a best solution2. We need to source minus target and target minus source3. If minus query returns any value those should be considered as mismatching rows4. Needs to matching rows among source and target using intersect statement5. The count returned by intersect should match with individual counts of source and target tables6. If minus query returns of rows and count intersect is less than source count or target table then we can consider as duplicate rows are existed.

Data CleannessUnnecessary columns should be deleted before loading into the staging area.

Types of ETL Bugs

Type of BugsDescription

User interface bugs/cosmetic bugs Related to GUI of application Font style, font size, colors, alignment, spelling mistakes, navigation and so on

Boundary Value Analysis (BVA) related bug Minimum and maximum values

Equivalence Class Partitioning (ECP) related bug Valid and invalid type

Input/Output bugs Valid values not accepted Invalid values accepted

Calculation bugs Mathematical errors Final output is wrong

Load Condition bugs Does not allows multiple users Does not allows customer expected load

Race Condition bugs System crash & hang System cannot run client platforms

Version control bugs No logo matching No version information available This occurs usually in regression testing

H/W bugs Device is not responding to the application

Help Source bugs Mistakes in help documents

Difference between Database testing and ETL testingETL Testing Data Base Testing

Verifies whether data is moved as expectedThe primary goal is to check if the data is following the rules/ standards defined in the Data Model

Verifies whether counts in the source and target are matchingVerifies whether the data transformed is as per expectationVerify that there are no orphan records and foreign-primary key relations are maintained

Verifies that the foreign primary key relations are preserved during the ETLVerifies that there are no redundant tables and database is optimally normalized

Verifies for duplication in loaded dataVerify if data is missing in columns where required

Responsibilities of an ETL testerKey responsibilities of an ETL tester are segregated into three categories Stage table/ SFS or MFS Business transformation logic applied Target table loading from stage file or table after applying atransformation.Some of the responsibilities of an ETL tester are Test ETL software Test components of ETL datawarehouse Execute backend data-driven test Create, design and execute test cases, test plans and test harness Identify the problem and provide solutions for potential issues Approve requirements and design specifications Data transfers and Test flat file Writing SQL queries3 for various scenarios like count testETL Performance Testing and TuningETL performance testingis a confirmation test to ensure that an ETL system can handle the load of multiple users and transactions. The goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune or improve the performance of the session, you have to identify performance bottlenecks and eliminate it. Performance bottlenecks can be found in source and target databases, the mapping, the session and the system. One of the best tools used for performance testing is Informatica.Automation of ETL TestingThe general methodology of ETL testing is to use SQL scripting or do eyeballing of data.. These approaches to ETL testing are time-consuming, error-prone and seldom provide complete test coverage. To accelerate, improve coverage, reduce costs, improve defect detection ration of ETL testing in production and development environments, automation is the need of the hour. One such tool is Informatica.Best Practices for ETL Testing1. Make sure data is transformed correctly2. Without any data loss and truncation projected data should be loaded into the data warehouse3. Ensure that ETL application appropriately rejects and replaces with default values and reports invalid data4. Need to ensure that the data loaded in data warehouse within prescribed and expected time frames to confirm scalability and performance5. All methods should have appropriate unit tests regardless of visibility6. To measure their effectiveness all unit tests should use appropriate coverage techniques7. Strive for one assertion per test case8. Create unit tests that target exceptions