a simple approach to multi-tenant data testing

A Simple Approach To Multi-Tenant Data Testing

By Melvin Laguren

With all the different types of testing methods performed on a product, the tester always has the following

question in the back of his or her mind:

Have I done enough testing?

This question begins to move to the front of the tester’s mind when the application being tested is multi-tenant in

nature and new ones begin to form in the back of the mind:

How can I guarantee that one customer does not see another customer’s data?

What testing method can I use to guarantee it?

Manually testing this by hand and taking a screenshot of the result is a long and tedious process and is prone to

user error. As the number of customers and their data increases, so does the time it will take to perform this

testing activity and the chances for error . Another issue with this approach is that with each new software release,

the process would have to be repeated. In the end, this process can potentially become a full time job for one

person.

Automating the manual process is the first step in the right direction. The only drawback here is that the

automated code has to be maintained and updated for each software upgrade.

Both methods require accountability from the tester. Whereas the manual process requires the tester to take a

snapshot and store it somewhere, the automation process is designed to record the testing activity. Both these

methods require a lot of planning to insure that the data is captured correctly

Where to begin?

When creating your multi-tenant data test solution, you must not only identify the problem --you must also

identify all the components that lead to a solution.

Let’s start by formally identifying the problem. Taking an example from the agile development method, a good

approach to writing the problem is in the form of a user story:

AS A PROFESSIONAL TESTER, I WOULD LIKE TO BE ABLE TO TEST THAT THE DATA CREATED IN THE APPLICATION CAN ONLY BE

VIEWED BY THE CUSTOMER WHO CREATED IT, SO THAT THE CUSTOMER IS CONFIDENT THAT THEIR INFORMATION CANNOT BE

VIEWED BY ANOTHER.

The “reason” from the user story explains it all. The test solution being developed must make the customer

confident that their information cannot be viewed by another.

Background Information

To begin identifying the solution, the following parameters will be added to the problem:

The multi-tenant application is an Ajax-based web application.

There is no budget to purchase tools and there are currently none at the tester’s disposal.

The customer works for a regulated industry, therefore the customer will perform an external audit before

accepting the application.

In reviewing the identified parameters, it is easy to see that the third bullet will be important in developing the

solution. The results of the test must be clearly documented to convince an auditor that thorough testing has

been performed on the application to insure data security.

Identify Possible Solutions

Now examine the first two parameters. The second parameter would imply that the tester would use a manual

solution.

Manual Solution 1

1. Log into the application as a customer

2. Take screen shots of all web pages that contains customer data

3. Save screen shots in a folder

4. Log out of application

5. Log into the application as a different customer and repeat steps 2 thru 4

6. Compare the data saved from the first customer to the data saved from the second customer

By testing the application manually – saving screenshots of the different pages that display the unique data and

visibly comparing the differences to the similar pages between the customer data pages – this solution has

addressed the three parameters that was given to the problem.

Drawbacks to Manual Solution 1

The first and foremost drawback with this solution is human error. A thorough tester would test more than just two

different customers. As the number of “customers” used for testing increases, the chances the tester could

inaccurately record the data from steps 2, 3 and 4 increase.

Another drawback is that the data being used for testing may fail to discover real world problems. This is especially

true if this is a first release of the application. Setting up test data can be time consuming and a tester may miss

something, especially if data is being entered for more than two customers.

So, what now? Analyzing the solution has introduced two new problems that need to be accounted for:

As the number of customers used for testing increases, so does the chance of incorrectly recording data

As the number of customers increases, there is a chance that the data used will not find a problem

Manual Solution 2

Looking at the parameters again, the third parameter could help solve the new problems discovered. The solution

now includes a second set of eyes.

Manual Solution 2A

1. Log into the application as a customer

2. Take screen shots of all web pages that contains customer data

3. Save screen shots in a folder

4. Log out of application

5. Log into the application as a different customer and repeat steps 2 thru 4

6. Compare the data saved from the first customer to the data saved from the second customer

7. Repeat Steps 1 – 6 with another tester

Manual Solution 2B

1. Tester A creates data in Test Environment A

2. Tester B creates data in Test Environment B

3. Tester A performs Manual Solution 2A (steps 1-6) on environment B

4. Tester B performs Manual Solution 2A (steps 1-6) on environment A

5. Testers A & B switch environments and repeat Manual Solution 2A (steps 1-6)

Drawbacks to Manual Solution 2

The solution reduces the potential of error because a second set of eyes are involved. What drawbacks exist with

this solution?

The difference between the first and second solution is that a second tester is involved in the process. What if this

resource is not available? Part of the second parameter says that, “there is no budget to purchase tools.” Applying

this parameter to the equation, the odds of hiring another tester is unlikely.

The level of confidence with the second solution is definitely more assuring to an auditor than the first solution.

Then again, if another tester is available to assist in the process, there is no guarantee that this second solution will

insure information security since it does not truly solve the human error issue discussed in the first solution.

How about automation?

Automation could violate the second parameter. However, since the first parameter says that the application is

web based, there are a multitude of open source applications available to automate either of the two processes

mentioned above. Automating the process means that considerable investment is required initially, then the

investment should be focused on developing the second solution. The reason is that after the solution has been

automated, the tester will have more time to create application data to be used in the test.

Automating the second solution is a very good beginning. The advantage for the tester is that the initial investment

made now means that down the road, more time can be focused on creating additional data and only minor

updating of the automated scripts for future versions of the application.

Other than not decreasing the odds of finding a problem, the biggest drawback an automation tool has is that it

cannot do the data comparison. This responsibility still belongs to the tester(s) to verify that the data is unique

between customers. This can be troublesome when the number of comparisons increases.

Problem Solved?

So far, 3 possible solutions have been discussed. All 3 basically followed the following pattern:

1. Test the application as a customer

2. Record the data being shown on each page

3. Continue Steps 1 and 2 with a different customer

4. Compare the results to make sure that they are unique

Under a tight deadline and limited resources, the tester’s focus will be on the functionality and performance of the

application which may lead him or her to think that as long as any of the three methods are used, the odds of a

data “bleed” are small. Others involved in the software development process may feel that the architecture of the

software will insure that this “bleed” will not occur, especially when combined with one of these three testing

methods.

This should not put a tester’s mind at ease, and it definitely will not put the auditor’s. So, how does the tester put

this issue to rest and focus on everything else that can possibly go wrong? Obviously, getting involved early on in

the development process will be great, especially if the tester has seen the common mistakes that can be made

when designing the application.

Increase Reliability

To increase the reliability of the testing, there are two items that the tester needs to get involved with long before

implementing a repeatable test process. Covering these two items will lead to a much more solid solution. This will

not only convince you and your colleagues that the odds the data could bleed elsewhere are very slim, but will also

convince outside observers that the testing is more than adequate.

Common Errors

The first is to understand the two most common errors that will lead to the problem and how to spot them.

Missing Foreign Keys

In the design of the database that will store the information, it is very rare that a direct relationship between two

tables would occur without having a foreign key between the parent table and the child table. The more common

mistake would occur as more relations are added to a table.

To illustrate the problem, one of the requirements for an application is that a contractor can return back to their bid

to make changes prior to finalizing their bid.

Figure 1

Being involved at the design phase, the tester would be able to see that the tables designed in Figure 1 will not

satisfy the necessary requirements. The reason is that if the application displays the bid for the project, the

contractor will be able to see all the bids made for the project that they are interested.

This early catch in the design assures that the data stored in the bid table is associated with the contractor table, as

well as the project table (Figure 2).

Figure 2

The Where Clause

Forgetting the where clause or not completely including everything in the where clause is another common error

that can result in data “bleeding”. Even with the improvements made in the database in the Figure 2, if the

following query is executed:

SELECT PROJECT.NAME, BID.AMOUNT, BID.FINAL FROM PROJECT INNER JOIN BID ON PROJECT.ID =

BID.PROJECT_ID WHERE PROJECT.PROJECT_ID = 1;

When executed by the web application, the contractor would be able to see all bids made on a project. Depending

on further requirements, the contractor could potentially edit or delete other bids to the project. The tester would

be able to catch this error and the following correction would be made:

SELECT PROJECT.NAME, BID.AMOUNT, BID.FINAL FROM PROJECT INNER JOIN BID ON PROJECT.ID =

BID.PROJECT_ID WHERE PROJECT.PROJECT_ID = 1 AND BID.CONTRACTOR_ID

Being involved in the design process and catching these common mistakes, will decrease the odds of customers

accessing data that they should not have access to.

PROJECT

• id

• name

• description

• closing_date

BID

• id

• project_id

• amount

• final

CONTRACTOR

• id

• contractor_name

PROJECT

•id

•name

•description

•closing_date

BID

•id

•project_id

•contractor_id

•amount

•final

CONTRACTOR

•id

•contractor_name

Mind Mapping

The second important item is to know the data model. Understanding the database tables and what is accessible

from the user interface will help you identify where to focus the testing. For example, if every customer has access

to the same contractor, then the test does not need to see if the contractor is unique per customer. If the

contractor works for different customers, then it is very important to see that the application allows customers to

see other customers that use the same contractor.

Mind mapping is the perfect technique to draw out what important information needs to be isolated from other

users of the application. There are even a lot of free tools that can help to create the Mind Maps. Figure 3 was

created using a free tool called FreeMind.

Figure 3

With the mind map created, it is easy to see the key data that should be isolated. Now the automation tool can

focus on accessing the web pages that display this information.

Developing The Automated Solution

Since the application is web based, there is an abundance of open source tools that can be used. The tool of choice

depends on the overall solution being developed. The ideal approach would be a simple script that will execute the

test and return a report.

Gather Data

Compare Data

Generate Report

Gathering Data Using JMeter

JMeter, a functional load testing tool, is the ideal tool for accomplishing the first part of the test. As a functional

load testing tool, it can automatically log into the application and navigate the various pages at once. Compared

to the manual method, the application will log in with all N users at once instead of one at a time.

Figure 4

JMeter also provides a post processor component, “Save Responses To A File”. This component will read the

responses from the server for each request made by JMeter and write it to a file. Place this component after each

http request that is used to call the web pages that display the customer only data. In Figure 4, JMeter has the

ability to add a prefix to the file being written. It will be very important to create a unique prefix for each request

that is unique, for example User1.xml, User2.xml, etc. In Figure 5, each of the file’s prefix are unique so that later

on the comparisons can be done on similar files.

Figure 5

Finally, JMeter can be executed from a script. This will make it easier to integrate into a multi-tenant testing

application, especially since getting the files is only the beginning of the problem.

Shell Scripting with Cygwin

Early on, it was noted that one of the drawbacks to the automated solution is that the automation tool could not

perform the comparison. It meant that the tester would be required to perform the task. A scripting language has

the capability to do the same task. One option is to use shell scripting. Since JMeter can be executed on either a

windows based computer or a *nix based computer, shell scripting would be the ideal language to use because of

Cygwin, a Linux-like environment for Windows.

What should the shell script accomplish? Since there are various tasks that the script must do, it would be best to

create several scripts to accomplish the following task:

1. Execute JMeter

2. Gather the files created by JMeter and group them so that it will be easy to compare

3. Remove excess information from the files1 for easy comparison

4. Perform the comparison

5. Create a response

Finally, one script can be created to execute each of the scripts created above.

Run JMeter

The first script is pretty straightforward. The script will navigate to JMeter’s installation directory and execute the

following command, ‘./jmeter -n multi-tenant.jmx’ where multi-tenant.jmx is the test case created by JMeter. To

make the script feel more robust, a simple series of tests can be performed to make sure that the conditions to run

JMeter are valid.

The first would make sure that the number of threads defined by multi-tenant.jmx is less than or equal to the

number of lines in the configuration csv file used for logging in. The reason is that if there are more threads

defined, then jmeter will start back at the beginning of the csv file for the next set of parameters to be used with

the next thread. This will definitely result in a duplicate data down the road.

The second check would insure that the test is setup to run correctly. If the number of threads defined by multi-

tenant.jmx is 1, then it is an invalid test. If the csv configuration file defined in multi-tenant.jmx does not exist, then

there is problem.

Adding these two checks will definitely make a stronger testing tool.

Gather the Results

Once JMeter has completed its run, the next task is to gather the saved files and place them in a single folder to

help identify the test run. Within that folder should be a folder for each of the different data sets used for

comparison.

1 Since JMeter writes the response from the server to a file, additional information (headers, html code, etc) would

exist.

Again, to make the script more robust, a simple file count between each of the subfolders will determine if the

correct number of files exist. Even if a particular request made does not contain any data, a file is still created. This

comparison can be made to see if the number of threads defined by ‘multi-tenant.jmx’ resulted with the correct

number of responses.

File Clean Up

Since the parameters state that the multi-tenant application is an Ajax- based web application, it is safe to assume

that the data being transported from the server to the client browser is in some sort of XML format. Each of the

files can now be cleaned up, so that all that remains is the xml data in question. The other important task that

needs to be done is to make sure that each of the individual elements and their attributes are separated into their

own line. This will make the file comparison easier.

By this point, there is probably no need to make any additional error checking prior to executing the clean up. If

anything, a post check could be performed to see if any files that will be used in the next script are blank.

Comparison

Now comes the time consuming and confusing part of the test, comparing the files within each of the group. The

more files generated for each of the data set, the longer the time. To be exact it will be:

𝑛

𝑛

1

− 𝑛

When writing the script, the first thing the script needs to know is the number of data comparisons or folders

created from the second executed script. Since the mind map created back in figure 3, a check can be performed to

make sure that there are three folders created for the different groups of data sets being compared.

ls –d */ > folders.txt

if wc –l folders.txt !=3…

The next part of the script will navigate to each folder and execute the comparison. Once inside a folder, the

comparison begins. Just like earlier, a file is generated which knows what files exist in the folder that needs to be

compared.

Before beginning the ordeal of executing the correct number of comparisons, the actual comparison script (to be

referred to as datacompare.sh) should be addressed. All *nix shells are provided with a diff command. The

inclination would be to use this in the shell script. This would be the wrong command to use since the diff

command will only compare line 1 of file A to line 1 of file B. It will not see if line 1 of file A exists anywhere else in

the file. In this case, the ever popular grep command would be more than enough.

# datacompare.sh

cat file1.txt | while read line

do

grep $line file2.txt > response.txt

cat response.txt >> results.txt

done

Replace file1.txt and file2.txt with $1 and $2 respectively, and now the datacompare.sh script will have 2

parameters that will be needed in order for the comparison to take place.

Managing the data files for comparisons can be handled by a double while loop and a manipulation of the number

that appears before the data files extension which was added by JMeter.

# comparison_manager.sh – Performed on folder XXX

cat datafiles.txt | while read i; do

cat datafiles.txt | while read j; do

if [ $i -lt $j ]

then ./datacompare.sh "User$i.xml" "User$j.xml"

uniq results.txt > output.txt

mv output.txt "$i"_to_"$j".data

fi

done

done

Above is the basic comparison algorithm. When completed, the end result will be a set of files which are the end

results of the comparison of 1 customer to another customer.

Analyze and Report

The solution is almost complete. All that is left is to comb through the various result files to determine if there is

any data in question. Just like gathering the files created by JMeter, the same technique will apply to the files

generated by the comparison script.

Once the files have been sorted, the first thing to do is to clean up the data in the files. Why? The data being

collected is stored in xml, the xml tags will automatically show up as a match for every comparison being executed.

This clean up will make it extremely easy to see if there is a problem. Files that do not have any matches will be

blank. A simple sed command can remove the blank lines in each of the files:

sed –i ‘/^$/d’ $12

After the removal of xml tags and blank lines from the different comparison result files, the following script below

will create a report file that will only contain the names of the files that may need further investigation.

# report.sh – Performed on folder XXX

ls *.data > files.rpt

cat files.rpt| while read line

do

if ((wc –l $line) > 0)

echo $line >> XXX.rpt

cat $line >> XXX.rpt

fi

done

2 If the ‘-i’ option is not available, then redirection should be used.

Ideally throughout the development of the solution, adequate testing was performed on both the JMeter test case

and the shell scripts, the report file generated will give accurate information about the application being tested.

Upgrade and Expansion

Just like any automated script, this solution will continue on in maintenance mode as the application being tested

grows. This will be easy to perform since the script was designed to call the following functions:

1. Execute JMeter to gather the http response from web based application and place them in data files.

2. Execute shell script to group data files and clean up files to leave only the data identified from mind map.

3. Execute shell script to compare the data groups and create comparison result files.

4. Execute shell script to group the comparison result files and clean up.

5. Analysis is performed on the clean comparison result files.

6. Report generated will identify any problems.

If designed properly, this script can be adapted to any other multi-tenant web based application by making minor

changes to the various shell scripts and the one JMeter test case. For non web-based applications, this method can

be applied by replacing JMeter with an emulator and making the necessary modifications to the remaining scripts.

a simple approach to multi-tenant data testing

Documents