continuous delivery while minimizing performance risks

CONTINUOUS

DELIVERY WHILE

MINIMISING

PERFORMANCE

RISKS

INTRODUCTION

OBJECTIVE

Put working software into production as quickly as possible,

whilst minimising risk of load-related problems:

› Bad response times

› Too small capacity

› Availability too low

› Excessive system resource use

RISK PREVENTION IS A BIG SUBJECT

Photo by chillihead: www.flickr.com/photos/chillihead/1778980935

PREVENTING RISK IS A BIG SUBJECT,

WHAT FOLLOWS IS TAKEN FROM OUR EXPERIENCE

CONTINUOUS DELIVERY LITERATURE PROVIDES METHODS

THAT HELP REDUCE RISK

› Blue-green deployments

› Dark launching

› Feature toggles

› Canary releasing

› Production immune systems

Jez Humble, http://continuousdelivery.com

BUT LEGACY SYSTEMS OFTEN LACK THE REQUIRED

RESILIENCE

WHILE WE WORK ON OUR RESILIENCE, WE USE LOAD TESTS

TO HELP IDENTIFY THE BIGGEST RISKS

PRE-PROD LOAD TESTING IS NOT FREE

› Extra code to maintain

› Usually test runs last several hours

› A production-like environment is expensive

› Realistic testing is hard

› Not all developers like writing (performance) tests

USE IT WISELY, WHERE PRODUCTION TESTING IS STILL

INAPPROPRIATE

› It provides no guarantee

› Use it to find any showstoppers you can

› Essentially, an optional service that teams can use

Photo by vastateparksstaff: www.flickr.com/photos/vastateparksstaff/5330257235

USE IT AS A PLAYGROUND TO TRY RISKY CHANGES

Functional integration

tests

Load tests

Build, unit test, etc.

Very often Less often About once a day

(at night)

Functional integration

tests

Load tests

Build, unit test, etc.

Very often Less often About once a day

(at night)

Load test script check

THE AIM IS NOT PERFECTION, GO FOR “AS REALISTIC AS

NEEDED”

SET UP TEST DATA IN THE WEEKEND, TO MINIMIZE

DISRUPTION

WHEN IS A PROBLEM REALLY A PROBLEM?

FIND AN OBJECTIVE WAY TO JUDGE YOUR FINDINGS

ESTABLISH REQUIREMENTS TO MAKE CLEAR WHAT IS

ACCEPTABLE

› Seen from the main stakeholders’ perspective

– Response time: users

– System resources: ops

– Capacity: business

› Specific

› Measurable

› Achievable

› Relevant

Concurrent users

Scale: Maximum load in a day, while

response times are still according to

spec.

Meter: Session table row count.

Intention: The website should at least be

able to manage our typical daily load, but

we would like some margin for growth

and marketing campaigns.

Stakeholder: Business

Fail: < 100k

Target: 200k

Now: 150k

FOR RESPONSE TIMES TOO, FOCUS ON THE MAIN

STAKEHOLDER!

FOR RESPONSE TIMES TOO, FOCUS ON THE

MAIN STAKEHOLDER!

SO USE A REAL BROWSER TO TEST

A REAL USER’S EXPERIENCE

Response time Fail [Today] Target

Homepage.FV > 6 sec 3.9 sec 2 sec

Homepage.RV > 5 sec 2.8 sec 1 sec

Checkout.FV > 8 sec 6.5 sec 2 sec

Details.FV > 6 sec 1.9 sec 2 sec

Details.RV > 5 sec 1.7 sec 1 sec

Search.FV > 6 sec 4.8 sec 2 sec

Search.RV > 5 sec 3.7 sec 1 sec

Cart.FV > 6 sec 4.4 sec 2 sec

Cart.RV > 5 sec 3.4 sec 1 sec

LoginForm.FV > 6 sec 3.5 sec 2 sec

LoginForm.RV > 5 sec 2.5 sec 1 sec

TO MAKE COMPARING SENSIBLE, MAKE YOUR TESTS

DETERMINISTIC

Stub systems that you have no control over

LOAD TESTING SHOULD BE OPTIONAL, THE ONLY THING

THAT COUNTS IS PRODUCTION!

› Your definition of done should reflect that

› The aim is to get early feedback from a safe environment

ANYTHING YOU FIND IS AN OPPORTUNITY TO FIX MORE

THAN ONE PROBLEM

SO WHAT MONITORING IS TYPICALLY NEEDED?

› Be able to localise where latency is coming from!

– For every system, all incoming and outgoing calls (count and

time spent stats)

› Finite resources (pools, CPU, I/O, etc.)

› Number of active users

› Response size, where possible

› Add whatever you need

It should be identical on all environments!

SET CLEAR TARGETS, SO YOU KNOW YOUR SITUATION

› How many errors would be OK in production?

› What kind of errors do we care about?

0

50

100

150

200

250

30000:0

0

01:0

0

02:0

0

03

:00

04:0

0

05:0

0

06:0

0

07:0

0

08

:00

09:0

0

10:0

0

11:0

0

12:0

0

13:0

0

14:0

0

15:0

0

16:0

0

17:0

0

18:0

0

19

:00

20:0

0

21:0

0

22:0

0

23:0

0

00

:00

Nu

mb

er

of

sta

le s

erv

er

sessio

n r

eq

uests

/

min

Other servers taken out of LB Other servers

back in LB

CONCLUSION

Support your continuous delivery process with optional load

tests and strong specs.

Use the load tests to identify some pain points, so you can

modify the code and add monitoring, making it safer to do

dark releases and canary testing in production.

Constantly ask yourself: what would it take to only do this in

production? Is it adding value?

QUESTIONS?

[email protected]

@a32an

www.xebia.com

blog.xebia.com

(we’re hiring)

continuous delivery while minimizing performance risks

Technology