a/b testing pitfalls - working at booking.com · source: https: //developer.amazon ... tracking...

76

Upload: dangdang

Post on 04-Jun-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

A/B TESTING PITFALLSAND BEST PRACTICES

WHO'S THIS DUDE?ALEJANDRO PARDO LOPEZ

Client Side Developer && Team Leader @ Booking.com@apardolopez

WHAT IS A/B TESTING?Test different versions of a website or feature by randomly

assigning your users into two or more groups, each one exposedto a different variant of the website/feature, and comparing the

impact of each against each other

WHY A/B TESTING?Ability to detect and measure the real impact of our changes in

the UX or performance

NO HIPPO'S, NO EXPERTOPINION

COMMON A/B TEST2 variant test (Base and Variant)Userbase split in 50% - 50%

EXAMPLES

Source: http://unbounce.com/a-b-testing/shocking-results/

WHAT DO WE WANT TOMEASURE

USUAL SUSPECTSConversion: Successful sign up / purchase / click on CTA pervisitor)Bounce ratesTime SpentOther user metrics (e.g. navigation times)

OTHER USEFUL METRICSFront End Performance (page load times, navigation times)Backend performance ( CPU wallclock, SQL wallclock )ErrorsExternal impact (e.g. # of Customer Care tickets)

BEYOND THE TYPICALA/B TEST

MULTIVARIANT TESTS (I.E. MORE THAN 2VARIANTS)

Test multiple variations of same featureCompare each variant aganinst the othersThe more variants, the bigger your user base needs to be todetect a changeUse a power calculator to determine how many users you needto detect a certain amount of impacte.g. http://www.evanmiller.org/ab-testing/sample-size.html

EXAMPLES OF MULTIVARIANTSCTA colour (famous Google's 40 shades of blue)Copy experiments (CTA's, email headlines)

REDUCED USER GROUP (E.G. 10% OR LESS OFTOTAL TRAFFIC)

Experimental features to reduced group of users for earlyfeedbackEarly detection of errorsEnabling potentially dangerous code (e.g. heavy DB queries)

GRACEFUL DEGRADATION - EMERGENCYSWITCHES

Disable lightactionsReduce data shown to reduce queries to overloaded DBHide buttons that lead to pages in trouble (e.g. in anotherdatacenter that is under pressure)

A/A EXPERIMENTSNo change

Used for validating the tracking framework and analysis report

INTERPRETING RESULTSOF A/B TESTS

CONCLUSIVE VS INCONCLUSIVE RESULTS

CONCLUSIVE RESULTSWe are confident that one of the variants is statistically

significant

WHUT???!!!!

IN OTHER WORDS...

We can confidently say which one is the winner (or loser)

Source: https://developer.amazon.com/sdk/ab-testing.html

INCONCLUSIVE RESULTS

If there was an effect it was too small to be measured

SECONDARY METRICS FTW!!!Secondary signupsPerformance impactErrorsInconclusive can be also a target impact (e.g. code refactor)

TRUSTWORTHY DATA"When running online experiments, getting numbers is easy;

getting numbers you can trust is hard"

TRUSTWORTHY DATAWithout data you can trust, you cannot make a decision.

Basically, you know nothing about the results of your test

ROBOTSThey can bias your results

Visitor numbers will be inflatedVisitor numbers can be altered in just one variant, makingdistributions unevenConversion rates can me affected as well due to the incrementin visitorsBut also clicks! Some robots parse Javascript

INTERFERING EXPERIMENTSModifications on same features running at the same time can

bias results

INTERFERING EXPERIMENTSE.g. button color change and position change

TRACKING

AKA PUTTING USERS IN YOUR EXPERIMENT

WRONG TRACKING === USELESS DATA...and wasted time...

...and unmeasured impact on the site...

...and rage++...

TRACKING CHALLENGES

ASSIGN USERS TO VARIANTS RANDOMLYDistribution of visitors should match the expected split

AVOID NOISETrack only people that are actually exposed to the change

Otherwise, spotting change in results is much harder, and exp hasto run for longer

e.g. Track everyone visiting the website, but the change is only onthe product page

TRACK ALL VARIANTSDon't forget any (e.g. track base)

USING JAVASCRIPT FORTRACKING

VERY POWERFULMore precise tracking (e.g. tracking based on user interactions)

TRACK USERS ONLY WHEN THEY AREEXPOSED TO CHANGE

LightboxesChange is actually viewed in browser viewport

BUT WEAKER TOOSensitive to JS errorsCookie overrides by HTTP requests (use server side cookiesinstead)

EXAMPLE TRACKING API

track( feature )if( track( featureA ) === 'b' ) { /* Cool stuffs */ }

$('.item').on('click', function( e ){ var title = 'Base title for lightbox';

/* Do some stuff */

showLightbox({ title: title });});

$('.item').on('click', function( e ){ var title = 'Base title for lightbox';

/* Do some stuff */

if( track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title });});

TRACKING PITFALL #1

$('.item').on('click', function( e ){ var title = 'Base title for lightbox', position;

/* Do some stuff */

position = $('#elem').offset().top; console.log( position ); /* End do some stuff */

if( track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title });});

TRACKING PITFALL #1Track as early as possible, but in the point where the change is

shown

TRACKING PITFALL #2

$('.item').on('click', function( e ){ var title = 'Base title for lightbox', content;

/* Do some stuff */

// Synchonous call, returns default content or variant B content. content = readContentFromServer() || {}; /* End do some stuff */

if( content.useVariantBcontent && track( featureA ) === 'b' ) { title = 'New title for lightbox'; } showLightbox({ title: title });});

TRACKING PITFALL #2Always track base

"ADVANCED" TRACKING

TRACK WHEN AN ELEMENT BECOMES VISIBLE

Useful for elements below the fold, that require scrolling to beseen

// Footer content is changed on the template, based on the variant(function(){ track.onView('#selector', feature);});

// Simple onView implementationtrack.onView = function( selector, feature ) { if( !selector || !feature ) return;

var trackIfVisible = function( data ){ if( isVisible( data.selector ) ) { track.feature( data.feature ); return true; } return false; };

if( ! trackIfVisible( selector, feature ) ) { throttle( trackIfVisible, { selector: selector, feature: feature } ) .on( 'scroll' ); // We don't want to run all the function code on each scroll event }}

TRACKING PITFALL #3

<p>Some content</p><p data-track-on-view="feature">New content added by variant B</p><p>Some other content</p>

<p>Some content</p><p>New content added by variant B</p><p data-track-on-view="feature">Some other content</p>

<p>Some content</p><div class="empty-visible-div" data-track-on-view="feature"></div><p>New content added by variant B</p><p>Some other content</p>

TRACKING PITFALL #3On view is sensitive to position, you might have visitor

distribution issues

TRACK WHEN USER NAVIGATES AWAY

<a data-track-on-click="feature" href="bazinga.html">Get me outta here!</a>

TRACKING PITFALL #4

Assume you might lose some visitors in the experimentCalling a tracking pixel or AJAX when browser is loadinganother page is completely unreliableYou can store in localStorage/Cookie the feature and track onnext page load (still not 100% relible)Alternatively, pass a parameter in the url for the server to dotracking on the page rendering

QUESTIONS?

FEEDBACK