testing a datawarehouse

Upload: ashish-rana

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Testing a Datawarehouse

    1/8

    Testing a Datawarehouse

    - an industrial challengeby Harry M. Sneed

    ANECON GmbH, [email protected]

    Abstract: This contribution is an experience reporton system testing and in particular on the testing ofa datawarehouse system. Datawarehouses arelarge databases used solely for querying andreporting purposes. The datawarehouse in questionhere was dedicated to fulfilling the reportingrequirements of the BASEL-II agreement on theprovision of auditing data by the banks, theEuropean equivalent of the SarbaneOxley Act. Thepurpose of the testing project was to prove that thecontents of the datawarehouse are correct inaccordance with the rules specified to fill them. Inthe end, the only way to achieve this was to rewritethe rule specifications in a machine readable form

    and to transform them into post assertions, whichcould be interpreted by a data verification tool for

    comparison of the actual data contents with theexpected data contents. The testing project wasnever fully completed, since not all of the rulescould be properly transformed. Keywords: SystemTesting, Datawarehouse Testing, DataTransformation Rules, Post Conditions Assertions,Formal Verification.

    1. Conventional System TestingThe project described in this paper presented both atheoretical and a practical challenge to the testingteam of the ANECON GmbH in Vienna. ANECON

    has been doing system testing on a contract basisfor five years now. Since then, they have tested asavings and loan system in a client/serverarchitecture, tested an ERP system installation in ahospital, tested a web-based dues collection systemand tested an e-government internet portal. All ofthese projects were more or less successful, if thereis anything like a success criteria for system testing.

    This project proved to be different.

    The problem with this project as with most testingprojects stems from the fact that there was noadequate language available to specify therequirements. If requirements are properly specifiedthey can be properly verified. In conventionaltesting projects it is the tester who bridges the gap

    between informal specifications and formalverification. In the case of datawarehouse testingthat is not possible due to the nature of the problem.

    The conventional way to test a system is to have abattery of testers sitting at the user interfaces, eachtesting a subset of the systems use cases. Thetesters submit predetermined inputs and check to

    see that the resulting output is what they should

    expect. They might even scan selected databasecontents to see how they are affected by the testtransactions. If any reports are generated, they willtrigger the jobs to produce them and check theircontents. If there are any doubts about the results,they will consult with the analysts or with the endusers. The checking of the output is done on a spotcheck basis. Through intuition or by means of someinerpt domain knowledge the tester is able tointerpret what is correct and what is not.

    At the center of conventional system testing is theconcept of a use case. A system is considered to befunctionally tested when all of its use cases with all

    of their usage variants have been tested. The usecase is also the basis for defining test cases [1]. One

    use case may have several test cases, one for eachalternate part through the use case. Seldom are allof the possible results checked. To do so wouldrequire too much time and effort. This conventionalsystem testing approach has been well defined inthe pertinent testing literature [2]. Sometimes it isrecommended to automate the filling up and thechecking of the user interface by some kind of

    capture/replay tool to expediate the test [3] andsometimes it is recommended not to do that, sinceautomation only clouds the issue and diverts thetester from his responsability for the accuracy of the

    test [4]. ANECON had always used theconventional test approach before. It was believedthat a similar approach would work for adatawarehouse project. The problem was only oneof finding enough testers to run the jobs and checkthe results. This belief was not going to be true.

    2. Datawarehouse System Testing

    A Datawarehouse consists of hundreds of relatedrelational database tables each with a number ofattributes. A single database table corresponds to

    some logical entity in the real world such as anaccount. If the database is normalized, the number

    of tables will be large but the number of attributesper table will be small, perhaps no more than 40.The goal is to allow the user to be able to join thedata together in any thinkable combination ofattributes in order to satisfy any possibleinformation request. Thus, the ultimate element of

    processing is the individual attribute which can becombined together with any other attributes. Anattribute corresponds to column of a table, whereasthe lines contain all attributes pertaining to one

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    2/8

    instance of the entity being described. Eachinstance of an entity, i.e. each line of a table, must

    be uniquely identifiable and distinguishable fromall other lines of that table. For that purpose, one ormore columns serve as a unique identifier.

    In the Datawarehouse in question there were 266

    tables with more than 11.000 attributes, giving anaverage of some 40 attributes per table. Eachattribute was to have a rule describing how thatattribute is derived. Attributes can be taken fromoperational data, they can be computed or they canbe set to a constant value. How this is to beaccomplished is described in the attribute rule. Forthis datawarehouse, the input data was delivered

    from the various bank subsidaries scatteredthroughout Eastern Europe. Since the local bankanalysts were the only ones familiar with their data,it was up to them to provide a data model of theiroperational data together with an English languagedescription of the individual data elements. These

    models were merged by the central bank analysts tomap the operational data on to the attributes of the

    datawarehouse. This lead to the so called mappingrules of which there were in the end, some 5317.

    The goal of datawarehouse testing is to demonstratethat the contents of the datawarehouse are correct.To achieve this means checking the attributesagainst their rules, i.e. comparing actual values withexpected values based on the mapping rules. This

    could be done with a random sample of allattributes, with a subset of critical attributes, or withthe complete set of attributes [5]. As wasdiscovered, the effort of checking even a smallsubset of attributes, is so great manually that evenstatistical testing becomes very expensive. On theother hand, if the rule verification is doneautomatically, then it might as well be done for all

    the attributes, since there is no additional price topay.

    3. The Challenge of defining a Rule

    Specification Language

    Due to the volumne of data and the complexity ofthe mapping rules, there is no real alternative to testautomation in testing a datawarehouse. It has to be

    automated to reach any degree of coverage andreliability. The problem here is how to bridge thegap between informally stated rules and formalverification criteria. This is where academia couldbe able to help. What is needed is a data rulespecification language, simple enough to be used bynon technical domain experts but formal enough tobe parsed and converted into stringent verification

    conditions. Once the language has been defined itshould be usable as a basis for both generating testdata and validating test results.

    4. Specifying the Data Transformation

    Rules

    Unfortunately the data transformation rules for thisproject were formulated in English prose by thebank analysts. The media used to do this, was an

    Excel table, generated from the E/A modelling tool Erwin. In this table was a line for each attributeand a column for each attribute descriptor. Theattribution descriptors were

    the name of the logical entity = table name

    the purpose of the table

    the name of the attribute

    the purpose of the attribute

    the type of attribute

    the significance of the attribute and

    the rule through which the attribute is derived.

    The rules were texts of several lines describing howthis particular attribute was to be computed. At first

    glance, it appeared to be a hopeless task to try andinterpret the rules. Many were indeed longalgorithms expressed in a pigeon English form bynon English speakers. However, a closer analysisrevealed that there were really only seven basictemplets used for over 90% of the rules. Theseseven basic rule templets were

    a simple 1:1 assignment of a source value froman input file to this attribute

    a simple 1:1 assignment of a given constantvalue to this attribute

    a m:1 assignment of a set of alternate constantvalues to this attribute

    a m:1 assignment of a set of concatenated

    source and constant values from one inputsource to this attribute

    a m:1 assignment of a set of concatenatedsource attributes from several different inputfiles to this attribute

    a 1:1 arithmtic assignment of a computed valueto this attribute

    a 1:1 assignment of the result of some standardfunction to this attribute.

    These assignments could be conditional or non-conditional. A conditional assignment was qualifiedby a boolean expression based on values contained

    in the input source files and on constant values. A

    special condition was the successful completion ofan SQL join operation. Otherwise the conditionswere nested, logical expressions with if, else andotherwise clauses. (see sample 1)

    Unfortunately, the assignments and their conditionswere formulated in a slightly different modethroughout the rule table. However, they weresimilar enough to be parsed and recognized. Thegreat majority - some 3950 - could be processed as

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    3/8

    they were. The remainder had to reformulated in asemi formal syntax. The syntax was as follows

    assign . for a 1:1assignment of an input value

    assign for a 1:1 assignment of a

    constant value

    assign ! for a set of

    alternate values

    assign Table.Attribute | | Table.Attributen for a concatenation assignment

    assign join Table_A.Attribute_A1 |Table_A.Attribute_A2 withTable_B.Attribute_B1 | Table_B.Attribute_B2for concatenating values from different datasources

    assign Table_A.Attribute_A1 +Table_B.Attribute_B1 * 2; for arithmeticexpressions. There was no nesting of clausesnor precedence rules here, so the arithmeticexpression was resolved in a simple left to rightsequence

    assign Function (Param_1, Param_2, Param_3)whereby the parameters could be attributes insource input files, i.e. Table_C.Attribute_C1,or constant values, i.e. 10.5

    With these assignment expressions, enhanced by anif expression in the form

    if (Table_A.Attribute_A1 )whereby := , < , >, =, , etc. and = . or

    more than 4700 rules could be resolvedautomatically and converted into post condition

    assertions, which could then be tested. Of theserules some 750 were manually adjusted requiringmore than 150 hours of effort. The fact that notmore mapping rules were adjusted was due not onlyto the limited time available for the test, but also tothe informal nature of the rules. Some rules wereformulated in such a confusing manner whichdefied reformulation. There was simply no way toformalize them. It was a fault of the project that the

    rules were not properly defined to begin with. Hadthey been formulated at least in some semi formalform, it would have been possible to automaticallyprocess them immediately without having to spendvaluable tester time in rewriting them. (see sample2)

    5. Post Assertions for validating test

    results

    Assertions have a long tradition in software testinggoing back to the mid 1970s. [6] Mostly they havebeen used for module testing. However, this authorhas used them to validate the results of systems.The tool DataTest uses assertions both to generatenew test data as well as to validate the contents of

    output data. It has been used to in previous projectsfor selective regression testing [7]. Basically it

    compares the values of attributes in a new databaseor new outputs with the values, which existed forthe same attributes in a previous database, oroutput. For every entity, i.e. datbase table, file orreport, a set of assertions is written, associating the

    new attributes with the old ones or withmanipulations on the old ones. The assertions are ofthe form

    assert new.Attr_1 = old.Attr_2;assert new.Attr_2 > old.Attr_3;assert new.Attr_4 < old.Attr_4;

    Instead of comparing a new value with an oldvalue, it was also possible to compare a new valuewith a constant, a set of constants, or a computedvalue as depicted below

    assert new.Attr_2 = 100.50;assert new.Attr_2 = A ! B ! C ! D!;assert new.Attr_2 = old.Attr.2 + 100 / 2;

    For this project, the assert statement was extendedto include concatenationsassert new.Attr_3 = old.Attr_3|-|old.Attr_4;

    Any assertion could become conditional byqualifying it with an if clause in the form

    assert new.Attr_4 = old.Attr_5if (old. Attr_5 > 100 & old. Attr_5 < 200 );

    There could be any number of and conditions.Logical or conditions were not allowed. They areexpressed in another form, namely by assigningdifferent assertions to the same attribute

    assert new.Attr_5 = old.Attr_6;

    assert new.Attr_5 < 100;assert new.Attr_5 > 200;

    If the value of Attr_5 fulfills any of theseassertions, it is considered to be correct.

    The assertions are grouped together into assertionprocedures, one per entity, and qualified by a keycondition. The key condition matches the keys of

    the new or output entity with those of the old orinput entity.

    if ( new.key_1 = old.key_1 &new.key_2 = old. key_2 .);

    assert new.Attr_1 = old.Attr_2;assert new.Attr_ 2 = 0;assert new.Attr_3 = old.Attr_3 + 5 ;

    end;

    For XML and WSDL there can be several entitytypes included in any one output report or response.Therefore, there must be a separate set ofassertions, for every entity type. These assertionsare qualified by the object name

    if (object = This_object &new.key_1 = old.key_1 )

    assert new.Tab.Attr = old.Tab.Attr;end_object;

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    4/8

    A comparison job is started after every test, which

    complies the assertion into internal symbol tablesand then processes the database tables or outputfiles one by one, comparing the content of eachasserted attribute against the result of the assertion.Attributes which do not comply with their

    assertions are reported as incorrect. In this way, theresults of a test can be automatically validatedwithout having to manually scan through and check

    them. This method is both faster and more reliable.A typical assertion procedure is depicted among thesamples at the end. (see sample 3).

    6. The Tools for Dataware House

    Testing

    For the datawarehouse test four tools were required.

    GenTest for transforming the mapping rulesinto assertion procedures

    GenSQL for generating SQL procedures from

    the assertion procedure AsrtComp for checking and compiling the

    assertion procedures

    DataVal for validating the database contents

    GenTest had the task of reading the rule table,extracting the rules, parsing them and convertingthem into assertions which could then be compared.It generated an assertion script for each targetdatabase table. As a by product GenTest alsogenerated test case specifications, one for each ruleto be stored in the Mercury Testcase Repository.Since this was done fully automatically, there wasno extra cost to the project. Finally it produced a

    statistical report on the number of entities, attributesand rules processed. This was important formeasuring the degree of data coverage. (seesamples 4 & 5)

    GenSQL was a follow-up tool to GenTest. It parsed

    each generated assertion procedure to determinewhich attributes were required from what tables. Itthen generated two sets of SQL query procedures,one for selecting data from n input tables andjoining them together to create a single CSV file foreach datawarehouse table, and another for selectingdata from the target datawarehouse table and downloading it into a CSV file. These two CSV files

    were to be the inputs to the data validation. (seesample 6)

    AsrtComp is a tool for checking the syntax of theassertion procedures, which could also be manuallyedited, and compiling them into intermediatesymbol tables. There can be five tables for eachentity to be validated

    a header table for identifying the objects to be

    validated and the number of entries in eachtable

    a key condition table with the names and typesof keys to be matched

    a comparison table assigning which newattributes are to be compared with which oldattributes and/or constants

    a constant table containing an entry for eachconstant value used as an operand in the

    assertions

    a condition table containing all of theconditions to be fulfilled for the assertions tobe executed. A pointer links the conditions tothe assertions to which they apply

    DataVal is the final tool in the set. After readingthe symbol tables it first processes the old CSV file,i.e. the inputs, and stores the values into a

    temporary database with their keys as an index. Ittakes the attribute tags from the first line of theCSV file and subequently counts the columns toassociate the values with the tags. It then processesthe new CSV file, i.e. the outputs, and matches each

    new record by key to an existing old record. If amatch is found the contents of the new record arecompared with the values of the old record or with

    the constant in the symbol table or with computedvalues or with concatenated values or with a set ofalternate constant values or with the lower andupper bounds of a range, depending on theconditions. Thus, there are many ways to verify thecorrectness of an output value. If no match is found,the old record is considered to be missing. Afterprocessing all new records, a second search is made

    of all the old records in the temporary database tosee if they were compared or not. If not they areconsidered to be missing in the new file. Aprotocoll lists out all of the incorrect data values,i.e. those that do not comply with their assertions,as well as all missing records. A set of test statisticssummarizes the percentage of records missing andincorrect attributes. (see sample 7)

    7. The Dataware House Test Process

    Tim Koomen and Martin Pol have emphasized the

    necessity of a well defined test process [8]. Anefficient test combines automated test tools with aproven process. Together this adds up to high testproductivity. The process for testing thisdatawarehouse system was, if anything, welldefined and automated. It consisted of 8 steps of

    which 4 were fully automated Step 1: The mapping rules which did not

    comply with the language standard had to bereformulated. This was the most time

    consuming task of the whole process and couldhave been avoided if the rules have beenproperly specified in the first place.

    Step 2: The mapping rules were automaticallyconverted into assertions by the GenTest tool

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    5/8

    Step 3: The SQL procedures wereautomatically generated from the assertions bythe GenSQL tool

    Step 4: The input attributes for each targetdatawarehouse table were selected from the

    input tables, joined and downloaded into aCSV file

    Step 5: The output attributes for each targetdatawarehouse table were downloaded into aCSV file

    Step 6: The assertion procedures werecompiled by the tool AsrtComp

    Step 7: The input and output CSV files werematched and the contents of the output fileverified against the post assertions by the toolDataVal.

    Step 8: The testers examined the datavalidation results and reported any errors

    In the end a report came out with the data errorswhich were then fed into the error tracking systemby the testers. Once the rules had been reformulatedthe whole testing process could be repeated within a

    day. Normally such a test cycle would require atleast 10 days. So the automation lead in this case toa significant improvement in test productivity.

    8. Conclusions from the Test Project

    This project demonstarted how important it is tohave a proper specification language for testing. Intesting an online system or a web-based system

    testers must simulate the end users. There are limitsto what you can automate. Creative testing plays animportant role, since it is impossible to forsee all

    modes in which the system could be used. Often thetester has to improvise. Therefore, the human testerhas a significant role to play.

    That is not true of a datawarehouse test. Here theproblem is to prove that the contents of thedatawarehouse are 100% correct. For that thecontents have to be 100% specified and the actual

    contents have to be verified against thespecifications. The challenge here is to provide a

    specification language which will accomodate bothgoals. Once the mapping rules have been specifiedit is the job of the tools to run the tests and verifythe database contents. The role of the tester can becompared to that of an engineer on a roboter

    assembly line, monitoring the work of the robotsand only intervening when something goes wrong.For this, he should understand the function of the

    robots, without having to do the work himself. Suchis the case in datawarehouse testing.

    In the datawarehouse described here only 88% ofthe attributes were actually tested. This resultedfrom the fact that 12% of the rules were notverifiable. Nevertheless, those that could be verifiedwere verified and more than 200 incorrect values

    could be identified. This project is a good exampleof improvising to make the best of a bad situation.It is always difficult to assess the success of a test

    project. The only objective way of doing it, it is tocompare the errors found in testing with the errorswhich come up later in production. Since thisdatawarehouse system has yet to go intoproduction, it is impossible to know how many

    errors might come up there. If they do it will bebecause of incomplete and inconsistent rules. Thespecification langauge problem remains the sourceof most software system errors and in the case ofdatawarehouse systems, particularly so. It is herewhere academia could make a significantcontribution.

    References:[01] Bach, James: Reframing RequirementsAnalysis, in IEEE Computer, Vol. 32, No. 6, 2000,p. 113[02] Hutcheson, Maggie.: Software TestingFundamentals, John Wiley & Sons, Indianapolis,2003, p. 12[03] Fewster,M./Graham,D.: Software TestAutomation, Addison-Wesley, New York, 1999, p.248[04] Kaner, C. / Bach, J. / Pettichord, B. : Lessonslearned in Software Testing, John Wiley & Sons,New York, 2002, p. 111

    [05] Dyer, Michael: Statistical Testing in TheCleanroom Approach to Quality SoftwareDevelopment, John Wiley & Son, New York, 1992,

    p. 123[06] Taylor, R.: Assertions in ProgrammingLanguages, in Proc. of NCC, Chicago, 1978, p.105.[07] Sneed, H.: Selective Regression Testing of a

    Host to DotNet Migration, submitted to ICSM-2006, IEEE Computer Society, Philadelphia, Sept.,2006[08] Koomen, T./ Pol, M.: Improving the TestProcess, John Wiley & Sons, London, 1999, p. 7

    Datawarehouse Test Process

    OriginalRule

    Specs

    Reformulating

    &Enhancing

    of the rules

    Formal

    Rule

    Specs

    Assertion

    Generator

    Tester

    formalizes

    the rules

    Tester

    sets up

    Test data

    Source

    Data

    System

    under

    Test

    Testerexecutes

    test

    Target

    data

    DataTest

    Test Result

    Validator

    Assertion

    Procedures

    Tester

    checks

    results

    Exception

    Report

    Tester

    reports

    errorsError

    Reports

    Test Casesassign old.Attribut

    if

    in semi-formaler

    prose

    Written by the

    bank analyst

    1 per DB-Entity

    If (new.key=old.key)

    assert new.Attribut=old.Attribut if ();

    assert new.Attribut=old.Attribut + wert*wert;

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    6/8

    Samples:

    Sample 1: An extract from the rule table before the rule has been converted

    DR_INTEREST_ID;"Link to TB0_ACCOUNT_INTEREST.Debit interest conditions applicable to theaccount."; If REIACD in field DICT(debit) has a value other than 0, account is linked tointerest group. The following then applies: REINTD / KEY (Position 3-4) (Interest Type) 2 AlphaREINTD / KEY (Position 5-9) (Interest Subtype) 5 Alpha REINTD / KEY (Position 10-12)(Currency) 3 Alphas The above Key fields are concatenated in ID. If in REIACD the DICTvalues are zeroes, the account interest condition has to be extracted from the ACCNTAB:ACCNTAB / DRIB (debit Base Rate Code) ACCNTAB / DRIS (debit Spread Rate) If both of 0value, extract ACCOUNT_ID If ACCNTAB / DRIS is available ( 0), extract the same as forACCOUNT_ID If only DRIB of value, extract DRIB

    Sample 2: An extract from the rule table after the rule has been converted

    DR_INTEREST_ID;"Link to TB0_ACCOUNT_INTEREST.

    Debit interest conditions applicable to the account.";" ? assign REIACD/DICT | REIACD/DCST | ACCNTAB/CCY | 'D' if REIACD/DICT (debit) '0',

    assign ACCNTAB/CONCAT if ACCNTAB/DRIS '0',assign ACCNTAB/CCY|ACCNTAB/DRIB if ACCNTAB/DRIB '0',assign '*nomap*' if REIACD/DICT = '00' and ACCNTAB/DRIS = '0' and ACCNTAB/DRIB = '00'assign ACCNTAB/CNUM|CCY|ACNO|BRCA if other. (18-digit account Id made up of CNUM length 6,

    leading zeros length 4, leading zeros +ACSQ length 2, leading zeros +BRCA concatenated).";

    Sample 3: A generated assertion procedure

    file: ACCOUNT;// This comparison procedure assumes that the old ACCOUNT file contains the followingattributes:// from ST_ACCOUNT Table ,ACCOUNT_ID;ATYP;STYP;RETB;CRIB;CRIS;DRIB,DRIS;PRFC;OD;FACT;FCNO// from REIACD Table REIACD.CICT;REIACD.DICT;REIACD.DACP;REIACD.CACP// from FCLTYFM Table FCLTYFM.RVDT

    if ( new.ACCOUNT_ID = old.concat_acct_id );assert new.GROUP_ACCOUNT_TYPE_ID = old.atyp if (old.atyp = "G");assert new.GROUP_ACCOUNT_TYPE_ID = "S" if (old.atyp = "R" & old.styp = "S");assert new.GROUP_ACCOUNT_TYPE_ID = "C" if (old.atyp = "R" & old.styp = "C");assert new.GROUP_ACCOUNT_TYPE_ID = "L" if (old.atyp = "R" & old.styp = "L");assert new.STATUS_BLOCKED = "1" if (old.retb = X"04");assert new.STATUS_BLOCKED = "2" if (old.retb = X"02");assert new.STATUS_BLOCKED = "3" if (old.retb = X"06");assert new.STATUS_BLOCKED = "*n.a.*" if (other);assert new.CR_INTEREST_ID = old.crib if (old.cict = "0");assert new.DR_INTEREST_ID = old.drib;assert new.DR_INTEREST_LIQU_ACCT = old.dacp;assert new.CR_INTEREST_LIQU_ACCT = old.cacp;assert new.PROFIT_CENTRE_DESCRIPTION = old.prfc;assert new.OVERDRAFT_START_DATE = "2005.05.15" if (old.od "0");assert new.OVERDRAFT_REVIEW_DATE = old.rvdt if (old.fact "0" & old.fcno "0") ;assert new.OVERDRAFT_REVIEW_DATE = "NULL" if (old.fact = "0") ;

    assert new.PRIMARY_ACCOUNT_INDICATOR = "P" ! "S" ! "*n.a.*";assert new.STATUS_INDICATOR = "inactiv" if (old.retb = X"3") ;assert new.STATUS_INDICATOR = "inactiv" if (old.retb = X"1") ;assert new.STATUS_INDICATOR = "closed" if (old.reci = "C") ;assert new.STATUS_INDICATOR = "active" if (other);

    end;

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    7/8

    Sample 4: A section from the assertion generation log

    +-------------------------------------------------------------------+|&New Assert Script: Asrt-011 = ST_ACCOUNTING_STANDARD+-------------------------------------------------------------------+| New Attribute ACCOUNTING_STANDARD_CODE has no rule!

    | New Attribute APPLICATION_ID has a rule that was not processable| Rule = text file| New Attribute QUALITY_INDICATOR has no rule!| New Attribute SOURCE_SYSTEM_ID has a rule that was not processable| Rule = text file+-------------------------------------------------------------------+| Number of new Attributes = 013| Number of Rules = 011| Number of Basel Rules = 003| Number of complex Rules = 005| Number of complexBasel Rules = 002| Number of generated Keys = 002| Number of generated Asserts = 009| Number of unrecognized Rules = 002+-------------------------------------------------------------------+

    Sample 5: The generation statistics

    +-------------------------------------------------------------------+| Assertion Generation completed !| Number of E/R Rule Lines processed = 11384| Number of new Tables processed = 266| Number of new Attributes processed = 10910| Number of old Tables processed = 132| Number of old Attributes processed = 00891| Number of new Attributes without a rule = 05573| Number of Transformation Rules specified = 05337| Number of Transformation Rules processed = 04742| Number of Basel-II Rules recognized = 00890| Number of Basel-II Rules processed = 00852

    | Number of Complex Rules recognized = 00853| Number of Complex Rules processed = 00562| Number of Complex Basel Rules recognized = 00157+-------------------------------------------------------------------+| Number of Assert Scripts generated = 00266| Number of Assertions generated = 04897| Number of Table Selectons generated = 00181| Number of Assert Keys generated = 00023| Number of Assert Conditions generated = 00308| Number of Assert Concatenates generated = 00103| Number of Assert Alternates generated = 00365| Number of Assert Arithmetic generated = 00009| Number of Test Cases generated = 05337+-------------------------------------------------------------------+

    Sample 6: A generated select procedure for the incoming data// SQL Select for Stage File: ST_ACCOUNT_INTEREST_RATESELECT

    // SQL Select for MiDas File:REINTDSELECT

    ACCOUNT_ID,KEY,BAL2,

    FROM REINTD JOIN// SQL Select for MiDas File:SDBANKPD

    SELECT

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.

  • 8/13/2019 Testing a Datawarehouse

    8/8

    BJRDNB,FROM SDBANKPD JOIN

    // SQL Select for MiDas File:ACCNTABSELECT

    LCD,FROM ACCNTAB JOIN

    END SELECT;// End of Select for: ST_ACCOUNT_INTEREST_RATE

    Sample 7: A section of the data validation report

    +----------------------------------------------------------------------------------------+| File/Table Comparison Report || Key Fields of Record(new,old) |+----------------------------------------------------------------------------------------+| New:ACCOUNT_ID || Old:concat_acct_id |+----------------------------------------------+-----------------------------------------+| RecKey:100000USD114601001 | duplicate key in old File/Table |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: GROUP_ACCOUNT_TYPE_ID | G |

    | Old: Constant_Value | L |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: CR_INTEREST_ID | XXX00 || Old: crib | 0 |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: DR_INTEREST_LIQU_ACCT | 0 || Old: dacp | 1 |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: OVERDRAFT_START_DATE | NULL || Old: Constant_Value | 2005.05.15 |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: OVERDRAFT_REVIEW_DATE | 9999-08-01_00:00:00 || Old: Constant_Value | NULL |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: APPLICATION_ID | GL || Old: Constant_Value | RE |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104601001 | || New: OVERDRAFT_REVIEW_DATE | 9999-08-01_00:00:00 || Old: Constant_Value | NULL |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104701001 | || New: GROUP_ACCOUNT_TYPE_ID | G || Old: Constant_Value | C |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS196501100 | missing from the new File/Table |+----------------------------------------------+-----------------------------------------+| Total Number of old Records checked: 91 || Number of old Records found in new File: 08 || Number of old Records with duplicate Keys: 72 |

    | Number of old Records not in new Table: 11 || Total Number of new Records checked: 59 || Number of new Records found in old File: 08 || Number of new Records with alternate Keys: 00 || Number of new Records not in old File: 51 || Total Number of Fields checked: 93 || Total Number of non-Matching Fields: 46 || Percentage of matching Fields: 51 % || Percentage of matching Records: 14 % |+----------------------------------------------------------------------------------------+

    oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)

    7695-2672-1/06 $20.00 2006IEEE