testing a datawarehouse
TRANSCRIPT
-
8/13/2019 Testing a Datawarehouse
1/8
Testing a Datawarehouse
- an industrial challengeby Harry M. Sneed
ANECON GmbH, [email protected]
Abstract: This contribution is an experience reporton system testing and in particular on the testing ofa datawarehouse system. Datawarehouses arelarge databases used solely for querying andreporting purposes. The datawarehouse in questionhere was dedicated to fulfilling the reportingrequirements of the BASEL-II agreement on theprovision of auditing data by the banks, theEuropean equivalent of the SarbaneOxley Act. Thepurpose of the testing project was to prove that thecontents of the datawarehouse are correct inaccordance with the rules specified to fill them. Inthe end, the only way to achieve this was to rewritethe rule specifications in a machine readable form
and to transform them into post assertions, whichcould be interpreted by a data verification tool for
comparison of the actual data contents with theexpected data contents. The testing project wasnever fully completed, since not all of the rulescould be properly transformed. Keywords: SystemTesting, Datawarehouse Testing, DataTransformation Rules, Post Conditions Assertions,Formal Verification.
1. Conventional System TestingThe project described in this paper presented both atheoretical and a practical challenge to the testingteam of the ANECON GmbH in Vienna. ANECON
has been doing system testing on a contract basisfor five years now. Since then, they have tested asavings and loan system in a client/serverarchitecture, tested an ERP system installation in ahospital, tested a web-based dues collection systemand tested an e-government internet portal. All ofthese projects were more or less successful, if thereis anything like a success criteria for system testing.
This project proved to be different.
The problem with this project as with most testingprojects stems from the fact that there was noadequate language available to specify therequirements. If requirements are properly specifiedthey can be properly verified. In conventionaltesting projects it is the tester who bridges the gap
between informal specifications and formalverification. In the case of datawarehouse testingthat is not possible due to the nature of the problem.
The conventional way to test a system is to have abattery of testers sitting at the user interfaces, eachtesting a subset of the systems use cases. Thetesters submit predetermined inputs and check to
see that the resulting output is what they should
expect. They might even scan selected databasecontents to see how they are affected by the testtransactions. If any reports are generated, they willtrigger the jobs to produce them and check theircontents. If there are any doubts about the results,they will consult with the analysts or with the endusers. The checking of the output is done on a spotcheck basis. Through intuition or by means of someinerpt domain knowledge the tester is able tointerpret what is correct and what is not.
At the center of conventional system testing is theconcept of a use case. A system is considered to befunctionally tested when all of its use cases with all
of their usage variants have been tested. The usecase is also the basis for defining test cases [1]. One
use case may have several test cases, one for eachalternate part through the use case. Seldom are allof the possible results checked. To do so wouldrequire too much time and effort. This conventionalsystem testing approach has been well defined inthe pertinent testing literature [2]. Sometimes it isrecommended to automate the filling up and thechecking of the user interface by some kind of
capture/replay tool to expediate the test [3] andsometimes it is recommended not to do that, sinceautomation only clouds the issue and diverts thetester from his responsability for the accuracy of the
test [4]. ANECON had always used theconventional test approach before. It was believedthat a similar approach would work for adatawarehouse project. The problem was only oneof finding enough testers to run the jobs and checkthe results. This belief was not going to be true.
2. Datawarehouse System Testing
A Datawarehouse consists of hundreds of relatedrelational database tables each with a number ofattributes. A single database table corresponds to
some logical entity in the real world such as anaccount. If the database is normalized, the number
of tables will be large but the number of attributesper table will be small, perhaps no more than 40.The goal is to allow the user to be able to join thedata together in any thinkable combination ofattributes in order to satisfy any possibleinformation request. Thus, the ultimate element of
processing is the individual attribute which can becombined together with any other attributes. Anattribute corresponds to column of a table, whereasthe lines contain all attributes pertaining to one
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
2/8
instance of the entity being described. Eachinstance of an entity, i.e. each line of a table, must
be uniquely identifiable and distinguishable fromall other lines of that table. For that purpose, one ormore columns serve as a unique identifier.
In the Datawarehouse in question there were 266
tables with more than 11.000 attributes, giving anaverage of some 40 attributes per table. Eachattribute was to have a rule describing how thatattribute is derived. Attributes can be taken fromoperational data, they can be computed or they canbe set to a constant value. How this is to beaccomplished is described in the attribute rule. Forthis datawarehouse, the input data was delivered
from the various bank subsidaries scatteredthroughout Eastern Europe. Since the local bankanalysts were the only ones familiar with their data,it was up to them to provide a data model of theiroperational data together with an English languagedescription of the individual data elements. These
models were merged by the central bank analysts tomap the operational data on to the attributes of the
datawarehouse. This lead to the so called mappingrules of which there were in the end, some 5317.
The goal of datawarehouse testing is to demonstratethat the contents of the datawarehouse are correct.To achieve this means checking the attributesagainst their rules, i.e. comparing actual values withexpected values based on the mapping rules. This
could be done with a random sample of allattributes, with a subset of critical attributes, or withthe complete set of attributes [5]. As wasdiscovered, the effort of checking even a smallsubset of attributes, is so great manually that evenstatistical testing becomes very expensive. On theother hand, if the rule verification is doneautomatically, then it might as well be done for all
the attributes, since there is no additional price topay.
3. The Challenge of defining a Rule
Specification Language
Due to the volumne of data and the complexity ofthe mapping rules, there is no real alternative to testautomation in testing a datawarehouse. It has to be
automated to reach any degree of coverage andreliability. The problem here is how to bridge thegap between informally stated rules and formalverification criteria. This is where academia couldbe able to help. What is needed is a data rulespecification language, simple enough to be used bynon technical domain experts but formal enough tobe parsed and converted into stringent verification
conditions. Once the language has been defined itshould be usable as a basis for both generating testdata and validating test results.
4. Specifying the Data Transformation
Rules
Unfortunately the data transformation rules for thisproject were formulated in English prose by thebank analysts. The media used to do this, was an
Excel table, generated from the E/A modelling tool Erwin. In this table was a line for each attributeand a column for each attribute descriptor. Theattribution descriptors were
the name of the logical entity = table name
the purpose of the table
the name of the attribute
the purpose of the attribute
the type of attribute
the significance of the attribute and
the rule through which the attribute is derived.
The rules were texts of several lines describing howthis particular attribute was to be computed. At first
glance, it appeared to be a hopeless task to try andinterpret the rules. Many were indeed longalgorithms expressed in a pigeon English form bynon English speakers. However, a closer analysisrevealed that there were really only seven basictemplets used for over 90% of the rules. Theseseven basic rule templets were
a simple 1:1 assignment of a source value froman input file to this attribute
a simple 1:1 assignment of a given constantvalue to this attribute
a m:1 assignment of a set of alternate constantvalues to this attribute
a m:1 assignment of a set of concatenated
source and constant values from one inputsource to this attribute
a m:1 assignment of a set of concatenatedsource attributes from several different inputfiles to this attribute
a 1:1 arithmtic assignment of a computed valueto this attribute
a 1:1 assignment of the result of some standardfunction to this attribute.
These assignments could be conditional or non-conditional. A conditional assignment was qualifiedby a boolean expression based on values contained
in the input source files and on constant values. A
special condition was the successful completion ofan SQL join operation. Otherwise the conditionswere nested, logical expressions with if, else andotherwise clauses. (see sample 1)
Unfortunately, the assignments and their conditionswere formulated in a slightly different modethroughout the rule table. However, they weresimilar enough to be parsed and recognized. Thegreat majority - some 3950 - could be processed as
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
3/8
they were. The remainder had to reformulated in asemi formal syntax. The syntax was as follows
assign . for a 1:1assignment of an input value
assign for a 1:1 assignment of a
constant value
assign ! for a set of
alternate values
assign Table.Attribute | | Table.Attributen for a concatenation assignment
assign join Table_A.Attribute_A1 |Table_A.Attribute_A2 withTable_B.Attribute_B1 | Table_B.Attribute_B2for concatenating values from different datasources
assign Table_A.Attribute_A1 +Table_B.Attribute_B1 * 2; for arithmeticexpressions. There was no nesting of clausesnor precedence rules here, so the arithmeticexpression was resolved in a simple left to rightsequence
assign Function (Param_1, Param_2, Param_3)whereby the parameters could be attributes insource input files, i.e. Table_C.Attribute_C1,or constant values, i.e. 10.5
With these assignment expressions, enhanced by anif expression in the form
if (Table_A.Attribute_A1 )whereby := , < , >, =, , etc. and = . or
more than 4700 rules could be resolvedautomatically and converted into post condition
assertions, which could then be tested. Of theserules some 750 were manually adjusted requiringmore than 150 hours of effort. The fact that notmore mapping rules were adjusted was due not onlyto the limited time available for the test, but also tothe informal nature of the rules. Some rules wereformulated in such a confusing manner whichdefied reformulation. There was simply no way toformalize them. It was a fault of the project that the
rules were not properly defined to begin with. Hadthey been formulated at least in some semi formalform, it would have been possible to automaticallyprocess them immediately without having to spendvaluable tester time in rewriting them. (see sample2)
5. Post Assertions for validating test
results
Assertions have a long tradition in software testinggoing back to the mid 1970s. [6] Mostly they havebeen used for module testing. However, this authorhas used them to validate the results of systems.The tool DataTest uses assertions both to generatenew test data as well as to validate the contents of
output data. It has been used to in previous projectsfor selective regression testing [7]. Basically it
compares the values of attributes in a new databaseor new outputs with the values, which existed forthe same attributes in a previous database, oroutput. For every entity, i.e. datbase table, file orreport, a set of assertions is written, associating the
new attributes with the old ones or withmanipulations on the old ones. The assertions are ofthe form
assert new.Attr_1 = old.Attr_2;assert new.Attr_2 > old.Attr_3;assert new.Attr_4 < old.Attr_4;
Instead of comparing a new value with an oldvalue, it was also possible to compare a new valuewith a constant, a set of constants, or a computedvalue as depicted below
assert new.Attr_2 = 100.50;assert new.Attr_2 = A ! B ! C ! D!;assert new.Attr_2 = old.Attr.2 + 100 / 2;
For this project, the assert statement was extendedto include concatenationsassert new.Attr_3 = old.Attr_3|-|old.Attr_4;
Any assertion could become conditional byqualifying it with an if clause in the form
assert new.Attr_4 = old.Attr_5if (old. Attr_5 > 100 & old. Attr_5 < 200 );
There could be any number of and conditions.Logical or conditions were not allowed. They areexpressed in another form, namely by assigningdifferent assertions to the same attribute
assert new.Attr_5 = old.Attr_6;
assert new.Attr_5 < 100;assert new.Attr_5 > 200;
If the value of Attr_5 fulfills any of theseassertions, it is considered to be correct.
The assertions are grouped together into assertionprocedures, one per entity, and qualified by a keycondition. The key condition matches the keys of
the new or output entity with those of the old orinput entity.
if ( new.key_1 = old.key_1 &new.key_2 = old. key_2 .);
assert new.Attr_1 = old.Attr_2;assert new.Attr_ 2 = 0;assert new.Attr_3 = old.Attr_3 + 5 ;
end;
For XML and WSDL there can be several entitytypes included in any one output report or response.Therefore, there must be a separate set ofassertions, for every entity type. These assertionsare qualified by the object name
if (object = This_object &new.key_1 = old.key_1 )
assert new.Tab.Attr = old.Tab.Attr;end_object;
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
4/8
A comparison job is started after every test, which
complies the assertion into internal symbol tablesand then processes the database tables or outputfiles one by one, comparing the content of eachasserted attribute against the result of the assertion.Attributes which do not comply with their
assertions are reported as incorrect. In this way, theresults of a test can be automatically validatedwithout having to manually scan through and check
them. This method is both faster and more reliable.A typical assertion procedure is depicted among thesamples at the end. (see sample 3).
6. The Tools for Dataware House
Testing
For the datawarehouse test four tools were required.
GenTest for transforming the mapping rulesinto assertion procedures
GenSQL for generating SQL procedures from
the assertion procedure AsrtComp for checking and compiling the
assertion procedures
DataVal for validating the database contents
GenTest had the task of reading the rule table,extracting the rules, parsing them and convertingthem into assertions which could then be compared.It generated an assertion script for each targetdatabase table. As a by product GenTest alsogenerated test case specifications, one for each ruleto be stored in the Mercury Testcase Repository.Since this was done fully automatically, there wasno extra cost to the project. Finally it produced a
statistical report on the number of entities, attributesand rules processed. This was important formeasuring the degree of data coverage. (seesamples 4 & 5)
GenSQL was a follow-up tool to GenTest. It parsed
each generated assertion procedure to determinewhich attributes were required from what tables. Itthen generated two sets of SQL query procedures,one for selecting data from n input tables andjoining them together to create a single CSV file foreach datawarehouse table, and another for selectingdata from the target datawarehouse table and downloading it into a CSV file. These two CSV files
were to be the inputs to the data validation. (seesample 6)
AsrtComp is a tool for checking the syntax of theassertion procedures, which could also be manuallyedited, and compiling them into intermediatesymbol tables. There can be five tables for eachentity to be validated
a header table for identifying the objects to be
validated and the number of entries in eachtable
a key condition table with the names and typesof keys to be matched
a comparison table assigning which newattributes are to be compared with which oldattributes and/or constants
a constant table containing an entry for eachconstant value used as an operand in the
assertions
a condition table containing all of theconditions to be fulfilled for the assertions tobe executed. A pointer links the conditions tothe assertions to which they apply
DataVal is the final tool in the set. After readingthe symbol tables it first processes the old CSV file,i.e. the inputs, and stores the values into a
temporary database with their keys as an index. Ittakes the attribute tags from the first line of theCSV file and subequently counts the columns toassociate the values with the tags. It then processesthe new CSV file, i.e. the outputs, and matches each
new record by key to an existing old record. If amatch is found the contents of the new record arecompared with the values of the old record or with
the constant in the symbol table or with computedvalues or with concatenated values or with a set ofalternate constant values or with the lower andupper bounds of a range, depending on theconditions. Thus, there are many ways to verify thecorrectness of an output value. If no match is found,the old record is considered to be missing. Afterprocessing all new records, a second search is made
of all the old records in the temporary database tosee if they were compared or not. If not they areconsidered to be missing in the new file. Aprotocoll lists out all of the incorrect data values,i.e. those that do not comply with their assertions,as well as all missing records. A set of test statisticssummarizes the percentage of records missing andincorrect attributes. (see sample 7)
7. The Dataware House Test Process
Tim Koomen and Martin Pol have emphasized the
necessity of a well defined test process [8]. Anefficient test combines automated test tools with aproven process. Together this adds up to high testproductivity. The process for testing thisdatawarehouse system was, if anything, welldefined and automated. It consisted of 8 steps of
which 4 were fully automated Step 1: The mapping rules which did not
comply with the language standard had to bereformulated. This was the most time
consuming task of the whole process and couldhave been avoided if the rules have beenproperly specified in the first place.
Step 2: The mapping rules were automaticallyconverted into assertions by the GenTest tool
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
5/8
Step 3: The SQL procedures wereautomatically generated from the assertions bythe GenSQL tool
Step 4: The input attributes for each targetdatawarehouse table were selected from the
input tables, joined and downloaded into aCSV file
Step 5: The output attributes for each targetdatawarehouse table were downloaded into aCSV file
Step 6: The assertion procedures werecompiled by the tool AsrtComp
Step 7: The input and output CSV files werematched and the contents of the output fileverified against the post assertions by the toolDataVal.
Step 8: The testers examined the datavalidation results and reported any errors
In the end a report came out with the data errorswhich were then fed into the error tracking systemby the testers. Once the rules had been reformulatedthe whole testing process could be repeated within a
day. Normally such a test cycle would require atleast 10 days. So the automation lead in this case toa significant improvement in test productivity.
8. Conclusions from the Test Project
This project demonstarted how important it is tohave a proper specification language for testing. Intesting an online system or a web-based system
testers must simulate the end users. There are limitsto what you can automate. Creative testing plays animportant role, since it is impossible to forsee all
modes in which the system could be used. Often thetester has to improvise. Therefore, the human testerhas a significant role to play.
That is not true of a datawarehouse test. Here theproblem is to prove that the contents of thedatawarehouse are 100% correct. For that thecontents have to be 100% specified and the actual
contents have to be verified against thespecifications. The challenge here is to provide a
specification language which will accomodate bothgoals. Once the mapping rules have been specifiedit is the job of the tools to run the tests and verifythe database contents. The role of the tester can becompared to that of an engineer on a roboter
assembly line, monitoring the work of the robotsand only intervening when something goes wrong.For this, he should understand the function of the
robots, without having to do the work himself. Suchis the case in datawarehouse testing.
In the datawarehouse described here only 88% ofthe attributes were actually tested. This resultedfrom the fact that 12% of the rules were notverifiable. Nevertheless, those that could be verifiedwere verified and more than 200 incorrect values
could be identified. This project is a good exampleof improvising to make the best of a bad situation.It is always difficult to assess the success of a test
project. The only objective way of doing it, it is tocompare the errors found in testing with the errorswhich come up later in production. Since thisdatawarehouse system has yet to go intoproduction, it is impossible to know how many
errors might come up there. If they do it will bebecause of incomplete and inconsistent rules. Thespecification langauge problem remains the sourceof most software system errors and in the case ofdatawarehouse systems, particularly so. It is herewhere academia could make a significantcontribution.
References:[01] Bach, James: Reframing RequirementsAnalysis, in IEEE Computer, Vol. 32, No. 6, 2000,p. 113[02] Hutcheson, Maggie.: Software TestingFundamentals, John Wiley & Sons, Indianapolis,2003, p. 12[03] Fewster,M./Graham,D.: Software TestAutomation, Addison-Wesley, New York, 1999, p.248[04] Kaner, C. / Bach, J. / Pettichord, B. : Lessonslearned in Software Testing, John Wiley & Sons,New York, 2002, p. 111
[05] Dyer, Michael: Statistical Testing in TheCleanroom Approach to Quality SoftwareDevelopment, John Wiley & Son, New York, 1992,
p. 123[06] Taylor, R.: Assertions in ProgrammingLanguages, in Proc. of NCC, Chicago, 1978, p.105.[07] Sneed, H.: Selective Regression Testing of a
Host to DotNet Migration, submitted to ICSM-2006, IEEE Computer Society, Philadelphia, Sept.,2006[08] Koomen, T./ Pol, M.: Improving the TestProcess, John Wiley & Sons, London, 1999, p. 7
Datawarehouse Test Process
OriginalRule
Specs
Reformulating
&Enhancing
of the rules
Formal
Rule
Specs
Assertion
Generator
Tester
formalizes
the rules
Tester
sets up
Test data
Source
Data
System
under
Test
Testerexecutes
test
Target
data
DataTest
Test Result
Validator
Assertion
Procedures
Tester
checks
results
Exception
Report
Tester
reports
errorsError
Reports
Test Casesassign old.Attribut
if
in semi-formaler
prose
Written by the
bank analyst
1 per DB-Entity
If (new.key=old.key)
assert new.Attribut=old.Attribut if ();
assert new.Attribut=old.Attribut + wert*wert;
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
6/8
Samples:
Sample 1: An extract from the rule table before the rule has been converted
DR_INTEREST_ID;"Link to TB0_ACCOUNT_INTEREST.Debit interest conditions applicable to theaccount."; If REIACD in field DICT(debit) has a value other than 0, account is linked tointerest group. The following then applies: REINTD / KEY (Position 3-4) (Interest Type) 2 AlphaREINTD / KEY (Position 5-9) (Interest Subtype) 5 Alpha REINTD / KEY (Position 10-12)(Currency) 3 Alphas The above Key fields are concatenated in ID. If in REIACD the DICTvalues are zeroes, the account interest condition has to be extracted from the ACCNTAB:ACCNTAB / DRIB (debit Base Rate Code) ACCNTAB / DRIS (debit Spread Rate) If both of 0value, extract ACCOUNT_ID If ACCNTAB / DRIS is available ( 0), extract the same as forACCOUNT_ID If only DRIB of value, extract DRIB
Sample 2: An extract from the rule table after the rule has been converted
DR_INTEREST_ID;"Link to TB0_ACCOUNT_INTEREST.
Debit interest conditions applicable to the account.";" ? assign REIACD/DICT | REIACD/DCST | ACCNTAB/CCY | 'D' if REIACD/DICT (debit) '0',
assign ACCNTAB/CONCAT if ACCNTAB/DRIS '0',assign ACCNTAB/CCY|ACCNTAB/DRIB if ACCNTAB/DRIB '0',assign '*nomap*' if REIACD/DICT = '00' and ACCNTAB/DRIS = '0' and ACCNTAB/DRIB = '00'assign ACCNTAB/CNUM|CCY|ACNO|BRCA if other. (18-digit account Id made up of CNUM length 6,
leading zeros length 4, leading zeros +ACSQ length 2, leading zeros +BRCA concatenated).";
Sample 3: A generated assertion procedure
file: ACCOUNT;// This comparison procedure assumes that the old ACCOUNT file contains the followingattributes:// from ST_ACCOUNT Table ,ACCOUNT_ID;ATYP;STYP;RETB;CRIB;CRIS;DRIB,DRIS;PRFC;OD;FACT;FCNO// from REIACD Table REIACD.CICT;REIACD.DICT;REIACD.DACP;REIACD.CACP// from FCLTYFM Table FCLTYFM.RVDT
if ( new.ACCOUNT_ID = old.concat_acct_id );assert new.GROUP_ACCOUNT_TYPE_ID = old.atyp if (old.atyp = "G");assert new.GROUP_ACCOUNT_TYPE_ID = "S" if (old.atyp = "R" & old.styp = "S");assert new.GROUP_ACCOUNT_TYPE_ID = "C" if (old.atyp = "R" & old.styp = "C");assert new.GROUP_ACCOUNT_TYPE_ID = "L" if (old.atyp = "R" & old.styp = "L");assert new.STATUS_BLOCKED = "1" if (old.retb = X"04");assert new.STATUS_BLOCKED = "2" if (old.retb = X"02");assert new.STATUS_BLOCKED = "3" if (old.retb = X"06");assert new.STATUS_BLOCKED = "*n.a.*" if (other);assert new.CR_INTEREST_ID = old.crib if (old.cict = "0");assert new.DR_INTEREST_ID = old.drib;assert new.DR_INTEREST_LIQU_ACCT = old.dacp;assert new.CR_INTEREST_LIQU_ACCT = old.cacp;assert new.PROFIT_CENTRE_DESCRIPTION = old.prfc;assert new.OVERDRAFT_START_DATE = "2005.05.15" if (old.od "0");assert new.OVERDRAFT_REVIEW_DATE = old.rvdt if (old.fact "0" & old.fcno "0") ;assert new.OVERDRAFT_REVIEW_DATE = "NULL" if (old.fact = "0") ;
assert new.PRIMARY_ACCOUNT_INDICATOR = "P" ! "S" ! "*n.a.*";assert new.STATUS_INDICATOR = "inactiv" if (old.retb = X"3") ;assert new.STATUS_INDICATOR = "inactiv" if (old.retb = X"1") ;assert new.STATUS_INDICATOR = "closed" if (old.reci = "C") ;assert new.STATUS_INDICATOR = "active" if (other);
end;
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
7/8
Sample 4: A section from the assertion generation log
+-------------------------------------------------------------------+|&New Assert Script: Asrt-011 = ST_ACCOUNTING_STANDARD+-------------------------------------------------------------------+| New Attribute ACCOUNTING_STANDARD_CODE has no rule!
| New Attribute APPLICATION_ID has a rule that was not processable| Rule = text file| New Attribute QUALITY_INDICATOR has no rule!| New Attribute SOURCE_SYSTEM_ID has a rule that was not processable| Rule = text file+-------------------------------------------------------------------+| Number of new Attributes = 013| Number of Rules = 011| Number of Basel Rules = 003| Number of complex Rules = 005| Number of complexBasel Rules = 002| Number of generated Keys = 002| Number of generated Asserts = 009| Number of unrecognized Rules = 002+-------------------------------------------------------------------+
Sample 5: The generation statistics
+-------------------------------------------------------------------+| Assertion Generation completed !| Number of E/R Rule Lines processed = 11384| Number of new Tables processed = 266| Number of new Attributes processed = 10910| Number of old Tables processed = 132| Number of old Attributes processed = 00891| Number of new Attributes without a rule = 05573| Number of Transformation Rules specified = 05337| Number of Transformation Rules processed = 04742| Number of Basel-II Rules recognized = 00890| Number of Basel-II Rules processed = 00852
| Number of Complex Rules recognized = 00853| Number of Complex Rules processed = 00562| Number of Complex Basel Rules recognized = 00157+-------------------------------------------------------------------+| Number of Assert Scripts generated = 00266| Number of Assertions generated = 04897| Number of Table Selectons generated = 00181| Number of Assert Keys generated = 00023| Number of Assert Conditions generated = 00308| Number of Assert Concatenates generated = 00103| Number of Assert Alternates generated = 00365| Number of Assert Arithmetic generated = 00009| Number of Test Cases generated = 05337+-------------------------------------------------------------------+
Sample 6: A generated select procedure for the incoming data// SQL Select for Stage File: ST_ACCOUNT_INTEREST_RATESELECT
// SQL Select for MiDas File:REINTDSELECT
ACCOUNT_ID,KEY,BAL2,
FROM REINTD JOIN// SQL Select for MiDas File:SDBANKPD
SELECT
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEEAuthorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
-
8/13/2019 Testing a Datawarehouse
8/8
BJRDNB,FROM SDBANKPD JOIN
// SQL Select for MiDas File:ACCNTABSELECT
LCD,FROM ACCNTAB JOIN
END SELECT;// End of Select for: ST_ACCOUNT_INTEREST_RATE
Sample 7: A section of the data validation report
+----------------------------------------------------------------------------------------+| File/Table Comparison Report || Key Fields of Record(new,old) |+----------------------------------------------------------------------------------------+| New:ACCOUNT_ID || Old:concat_acct_id |+----------------------------------------------+-----------------------------------------+| RecKey:100000USD114601001 | duplicate key in old File/Table |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: GROUP_ACCOUNT_TYPE_ID | G |
| Old: Constant_Value | L |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: CR_INTEREST_ID | XXX00 || Old: crib | 0 |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: DR_INTEREST_LIQU_ACCT | 0 || Old: dacp | 1 |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: OVERDRAFT_START_DATE | NULL || Old: Constant_Value | 2005.05.15 |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: OVERDRAFT_REVIEW_DATE | 9999-08-01_00:00:00 || Old: Constant_Value | NULL |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104501001 | || New: APPLICATION_ID | GL || Old: Constant_Value | RE |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104601001 | || New: OVERDRAFT_REVIEW_DATE | 9999-08-01_00:00:00 || Old: Constant_Value | NULL |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS104701001 | || New: GROUP_ACCOUNT_TYPE_ID | G || Old: Constant_Value | C |+----------------------------------------------+-----------------------------------------+| RecKey:100000ATS196501100 | missing from the new File/Table |+----------------------------------------------+-----------------------------------------+| Total Number of old Records checked: 91 || Number of old Records found in new File: 08 || Number of old Records with duplicate Keys: 72 |
| Number of old Records not in new Table: 11 || Total Number of new Records checked: 59 || Number of new Records found in old File: 08 || Number of new Records with alternate Keys: 00 || Number of new Records not in old File: 51 || Total Number of Fields checked: 93 || Total Number of non-Matching Fields: 46 || Percentage of matching Fields: 51 % || Percentage of matching Records: 14 % |+----------------------------------------------------------------------------------------+
oceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06)
7695-2672-1/06 $20.00 2006IEEE