eurostar 2013 albert witteveen final

With cloud computing:who needs performance testing

Albert Witteveen

You just woke up after a 10 years nap:

Team member:“We can add extra processing power and memory on the fly.

An extra database has a lead time of two weeks”

Imagine

Does this sound familiar: Performance test: everything OK Day 1 on production: we end up adding more than four

times the hardware

Question

1. the tools simulate but are not quite equal2. load profiles are based on too many assumptions3. we report more accurately than we can measure4. long setup time → limited amount of tests5. we hide it all in complex reports

Load testing weak points

We send and accept the same requests and responses but can't anticipate slight changes

In production, a lot more is going on than just our test Did we really get a good response Similar hardware is expensive

Our tools simulate reality but are not equal

Cloud computing: adding extra hardware can be done on the fly and on a moments notice

With the high costs of performance testing and how easy we can 'speed things up' if needed:

Why bother testing? The money is better spent on that extra hardware

Cloud computing

Just start with an overkill of hardware and scale down to what is actually used!

Dutch auction?

Introducing: The Queuing theory

The naïve tester (me)

Computers are running or idling. The queuing theory is an established model for

performance engineers It can describe the behavior of systems on every layer

Queuing theory

Simple queue

Multiple registers

Take a number

Queuing center: a location in our system where waiting (queuing) occurs a Bottleneck if you will

◦ They can exist anywhere: CPU, Memory, Network, IO, other systems◦ There is always one or more queuing centers◦ A queuing center really determines the performance◦ The queuing center provide key information on scalability◦ Service and wait time are the real components of performance

Queuing center

Queuing model describe anything: large connected systems, small, embedded ...

You can 'zoom in' and the model can describe the behavior or the server

You can keep zooming in to CPU, network etc.

Multiple layers

Multiple zoom levels Residence time = wait + service time There is always a queuing center No queuing center found: look harder

Queuing models

Cloud computing not infinite: Financial limit Technical: IO/Network/CPU speed per process

We don't build supercomputers to calculate a mortgage offer

Back to the cloud

Always find the queuing centers Based on the result: judge 'yes we are likely to meet

requirement X Y and Z' Show where the risks are 'requirement x cannot be feasibly

met for function y' Explore the risks

How to apply the queuing theory

Explore identified resource heavy components with stakeholders, developers and oracles◦ Other use of this component?◦ Real frequency of usage?◦ Validity of the (generic) requirement for this function?

Place the results in context: ◦ You may have a bigger issue than you thought◦ Or it is actually OK for this usage

Exploring the risks

Define a set of key functions/use cases with stakeholders and experts (i.e. functional testers)

Per test identify at least one queuing center Compare with generic requirements

◦ Can meet ?◦ Risk exist → explore → place in context →define further test

The model allows you to place real behavior in context and a realistic assessment of risk

Approach

If no queuing center was found → monitoring was not sufficient

Queuing centers:◦ Tell you about the risks to core functionality: performance and

financial◦ Tell you on the ability to scale◦ Improve response time in scaling up

For stakeholders

Stakeholders don't (necessarily) understand queuing models

Explain in what matters to them: i.e. when making the offer it takes 15 seconds to generate

Think of the systems as queuing systems and explain behavior

Make a model?

Knowing what the behavior is can tell you:◦ if you can handle requirements ◦ how to scale if needed◦ estimate if performance can be met within budget◦ if you need to adapt your cloud (i.e. improve IO/network, CPU)

So yes: it still makes sense to do performance testing

Summary for the cloud

Batch process tested to be run from multiple servers Process needed to be faster Risk: 'on-line' processes on server should not be impacted

Finding: 3 servers, three times as fast. But no queuing center found???

Deep diving in CPU monitoring showed the queuing center: Process was pausing/waiting after each cycle

Conclusion: → on-line processes not impacted as there was sufficient CPU time for other processes

Example: batch process

Stress point found Unclear where queuing center was

Cause: JAVA memory management can be deceiving on OS level.

Rule that the queuing center needed to be found made us find out. The absence of a queuing center makes you look further

Example JAVA

eurostar 2013 albert witteveen final

Documents