bad experiments: the #18 ways you’re a/b tests are going wrong

Learn from the 18+ mistakes that I’ve made while running experimentation programs & how to overcome them.

Bad Experiments: The #18 Ways You’re A/B Tests are Going Wrong.

Martijn Scheijbeler

Growth & SEO at PostmatesFormer Marketing Director at The Next Web

[email protected] @MartijnSch

Page

You finally realized:“We want to do more A/B Testing for our

- Every Manager Ever

What I’m seeing: Most Experimentation Programs Are Based on Guesswork

@MARTIJNSCH

#LAUNCHSCALE

450%

900%

1,35

0%1,

800%

Janaury February March April May June July August September October November December

Page

Or in other words:

Given the options,I prefer to learn from success.

7@MARTIJNSCH

#LAUNCHSCALE

Page

Ideas

Quality Assurance

Design & Engineering

Hypothesis

Run & Analysis

Repeat

A/B Testing

Repeatthis process…

8@MARTIJNSCH

#LAUNCHSCALE

Page

Or in other words:

Test. Eat. Sleep. Repeat.

9@MARTIJNSCH

#LAUNCHSCALE

WHAT CAN YOU TEST?You want to know what might improve the bottomline.

IdeasGenerate a backlog of ideas

You want to generate a lot of ideas to make sure that there are always new ways to keep on testing with the current resources that you have. By having a longer

backlog you have the opportunity to test and prioritize at the same time.

PROCESSES ARE ESSENTIAL

Ideas

Quality Assurance


Hypothesis

Run & Analysis

Repeat

A/B Testing


@MARTIJNSCH

#LAUNCHSCALE

They just launch, they don’t test.

MISTAKE #1

By not testing what you’re launching you’re setting yourself up for failure.

MISTAKE #2

“We’re losing money by testing”, it’s the opposite. The companies that aren’t testing are losing money because they’re putting losers online without looking at the numbers.

Companies that believe they’re wasting money.

@MARTIJNSCH

#LAUNCHSCALE

Expecting Big Wins

MISTAKE #3

They’re expecting to win the lottery by setting up their testing program. Truth is, most of the tests that you’ll be running will be a loser. But these are the biggest wins that you’ll get as they’ll teach you something.

MISTAKE #4

Looking at your competitors might be useful to come up with new ideas for your backlog. But 99% of the time your competitors also won’t have a clue on what they’re doing.

My Competitor is Doing X, so that’s why we’re testing X.

@MARTIJNSCH

#LAUNCHSCALE

Page

Running tests when you don’t have traffic.

MISTAKE #5

When you don't have enough traffic you won’t reach the right significance (%95+) to make sure that you can say that you have a winner/loser or not.

13@MARTIJNSCH

#LAUNCHSCALE

Page


HypothesisDo you really know what you will be testing?

Write a hypothesis, you want to know why you’re testing what you’re testing. Otherwise you cannot actively make sure that you’re test are having the outcome that you’re

expecting. It will also help you in making sure that your tests will provide an active learning instead of just trying to hunt down some results.


Ideas

Quality Assurance


HypothesisRepeat

A/B Testing


Run & Analysis

14@MARTIJNSCH

#LAUNCHSCALE

Page

They don’t create a hypothesis.

MISTAKE #6

When you start testing you want to make sure that you know what kind of impact you’re expecting from an experiment.

15@MARTIJNSCH

#LAUNCHSCALE

Testing multiple variables at the same time, making 3 changes basically requires 3 tests.

MISTAKE #7

“Let’s test the button, oh and let’s also change the header + CTA”.

MISTAKE #8

“Let’s run a test because I like the colors green and yellow more and I heard on BART that they work better”.

Use numbers as the basis of your research, not your gut feeling.

@MARTIJNSCH

#LAUNCHSCALE

Page


Design & EngineeringHave designers & engineers work together to create treatments

When you’re designing and engineering your tests, because in all honesty who’s using the WYSIWYG editors after a few tests you want to make sure you’re in constant sync with your designers to make sure that the

tests that you’re running are within the guidelines of your design (or not) and that the code is written as it’s supposed to be.


Ideas

Quality Assurance


HypothesisRepeat

A/B Testing


Run & Analysis

17@MARTIJNSCH

#LAUNCHSCALE

Before and After is not an A/B test. We launched, let’s see what the impact is.

MISTAKE #9

A/B Testing doesn’t work when you’re just launching a feature and trying to compare the results: before/after. It doesn’t take into account all the different variables that could be impacting your results.

MISTAKE #10

Many times I’ve seen companies or designers worry too much about the designs that they’re testing. Make sure that you can test with multiple variants if that’s possible so it’s not up to the designer to decide what is working.

They go over 71616 revisions for the design.

@MARTIJNSCH

#LAUNCHSCALE

Page


Quality AssuranceAre you sure your test is working?

You have the test ‘working’ as your designers and engineers have created the deliverables for the test. But are you really sure that your test is working in all the

browsers that your visitors are using? Make sure to test extensively before every test that you’re activating.


Quality Assurance


HypothesisRepeat

A/B Testing

Repeatthis process… Ideas

Run & Analysis

19@MARTIJNSCH

#LAUNCHSCALE

Page

They don’t Q&A their tests. Even your mother can have an opinion this time.

MISTAKE #11

Make sure that you’re QA’ing your experiment before you activate it. It will prevent you from stopping a test halfway when your results are skewed.

#lovemoms

20@MARTIJNSCH

#LAUNCHSCALE

Page


Run & AnalysisTime to have users experience your treatments

It’s time to start running your test, make sure that you’re running. Essential is that you’re running your tests for

the right duration. After this comest the best part, analysis.


Quality Assurance


Hypothesis

Run & Analysis

Repeat

A/B Testing


21@MARTIJNSCH

#LAUNCHSCALE

Page

Running your tests not long enough, calling the results early.

MISTAKE #12

You want to make sure that you reach significance, but at the same time you also need to make sure that you’re dealing with your business cycles.

22@MARTIJNSCH

#LAUNCHSCALE

Page

Running multiple tests with overlap.. it’s possible, but segment the sh*t out of your tests.

MISTAKE #13

You have multiple tests running?

Great, but make sure that you also segment based on that data and plot what the impact is of the overlap in between tests.

23@MARTIJNSCH

#LAUNCHSCALE

@MARTIJNSCH

#LAUNCHSCALE

Data is not sent to your main analytics tool, or you’re comparing your a/b testing tool to analytics, good luck.

MISTAKE #14

MISTAKE #15

Basic fail #1, as long as you don’t have the right significance you’re basically running your testing program like flipping a coin: you have a 50/50 chance of finding a winner.

Going with your results without significance.

@MARTIJNSCH

#LAUNCHSCALE

Page


RepeatStart all over again: eat, test, sleep, repeat

You’ve been running your first experiments that hopefully have been successful. Now comes the next step: repeating the process.


Quality Assurance


Hypothesis

Run & Analysis

Repeat

A/B Testing


26@MARTIJNSCH

#LAUNCHSCALE

Not deploying your winner fast enough, takes 2 months to launch

MISTAKE #16

MISTAKE #17

Keep track of what you’re testing. Once you’re exceeding 50 experiments it’s unlikely that you still know what you’ve tested before and what the results were.

They’re not keeping track of their tests. No documentation.

@MARTIJNSCH

#LAUNCHSCALE

Speed is one of the most essential components of a Growth

process. So not launching your winning tests or waiting for it for

a while is not helping you or your company.

Page

They give up.

MISTAKE #18

NEVER EVER GIVE UP ON TESTING.

28@MARTIJNSCH

#LAUNCHSCALE

If you double the number of experiments you’re going to double your inventiveness- Jeff Bezos, Amazon

Page

Ideas: 1. They just launch, they don’t test.

2. Companies that believe they’re wasting money. 3. Expecting Big Wins. 4. My Competitor is Doing X, so that’s why we’re testing X.

5. Running tests when you don’t have traffic.

Hypothesis: 1. They don’t create a hypothesis.

2. Testing multiple variables at the same time, making 3 changes basically requires 3 tests.

3. Use numbers as the basis of your research, not your gut feeling.

Design & Engineering: 1. Before and After is not an A/B test. We launched, let’s see what the impact is.

2. They go over 71616 revisions for the design.

30

Summary

@MARTIJNSCH

#LAUNCHSCALE

Page

Quality Assurance: 1. They don’t Q&A their tests. Even your mother can have an opinion this time.

Run & Analysis: 1. Running your tests not long enough, calling the results early.

2. Running multiple tests with overlap.. it’s possible, but segment the sh*t out of your tests.

3. Data is not sent to your main analytics tool, or you’re comparing your a/b testing tool to analytics, good luck.

4. Going with your results without significance

5. You run your tests for too long… more than 4 weeks is not to be advise, cookie deletion. Multiple issues.

Repeat: 1. Not deploying your winner fast enough, takes 2 months to launch

2. They’re not keeping track of their tests. No documentation.

3. They Give Up!

31

Summary

@MARTIJNSCH

#LAUNCHSCALE

Page

Thank you, have fun!

32

@MartijnSch

#LAUNCHSCALE

@MARTIJNSCH