advance a/b testing - geecon krakow 2015
TRANSCRIPT
Experimenting on HumansAdvanced A/B Testing
Aviran Mordo
Head of Back-end Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
Wix In Numbers
Over 65M users + 1.5M new users/month
Static storage is >2Pb of data
3 data centers + 3 clouds (Google, Amazon, Azure)
2B HTTP requests/day
1000 people work at Wix, of which ~ 500 in R&D
Basic A/B testing
Experiment driven development
PETRI – Wix’s 3rd generation open source experiment system
Challenges and best practices
Complexities and effect on product
Agenda
EVERY new feature is A/B tested
We open the new feature to a % of users
Measure success
If it is better, we keep it
If worse, we check why and improve
If flawed, the impact is just for % of our users
Conclusion
New code can have bugs
Conversion can drop
Usage can drop
Unexpected cross test dependencies
Sh*t happens (Test could fail)
Language
GEO
Browser
User-agent
OS
Minimize affected users (in case of failure)
Gradual exposure (percentage of…)
Company employees
User roles
Any other criteria you have (extendable)
All users
First time visitors = Never visited wix.com
New registered users = Untainted users
Existing registered users = Already familiar with the
service
Not all users are equal
Solution – Pause the experiment!• Maintain NEW experience for already exposed
users• No additional users will be exposed to the NEW
feature
PETRI’s pause implementation
Use cookies to persist assignment
If user changes browser assignment is unknown
Server side persistence solves this
You pay in performance & scalability
Decision (What to do with the data) Keep feature Drop feature
Improve code & resume experiment
Keep backwards compatibility for exposed users forever?
Migrate users to another equivalent feature
Drop it all together (users lose data/work)
Numbers look good but sample size is small
We need more data!
Expand
Reaching statistical significance
25% 50% 75% 100%
75% 50% 25% 0%Control Group (A)
Test Group (B)
Signed-in user Test group is determined by the user IDGuarantee toss consistency across browsers
Anonymous user (Home page)Test group is randomly determinedCannot guarantee consistent experience cross
browsers
11% of Wix users use more than one desktop browser
Keeping consistent UX
# of active experiment Possible # of states
10 1024
20 1,048,576
30 1,073,741,824
Possible states >= 2^(# experiments)
Wix has ~400 active experiments ~2.58225e+120
Supporting 2^N different users is challenging
How do you know which experiment causes errors?
Managing an ever changing production env.
Enable features by existing content What will happened when you remove a component
Enable features by document owner’s assignment The friend now expects to find the new feature on his own
docs
Exclude experimental features from shared documents
You are not really testing the entire system
Possible solutions
Petri is more than just an A/B test framework
Feature toggle
A/B TestPersonalizatio
n
Internal testing
Continuous deployment
Jira integration
Experiments
Dynamic configuration
QA
Automated testing
Petri is an open source projecthttps://github.com/wix/petri
Q&A
https://github.com/wix/petri
http://goo.gl/dqyely
Aviran Mordo
Head of Back-end Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
Creditshttp://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg
http://goo.gl/nEiepT
https://www.flickr.com/photos/ilo_oli/2421536836
https://www.flickr.com/photos/dexxus/5791228117
http://goo.gl/SdeJ0o
https://www.flickr.com/photos/112923805@N05/15005456062
https://www.flickr.com/photos/wiertz/8537791164
https://www.flickr.com/photos/laenulfean/5943132296
https://www.flickr.com/photos/torek/3470257377
https://www.flickr.com/photos/i5design/5393934753
https://www.flickr.com/photos/argonavigo/5320119828
Modeled experiment lifecycle
Open source (developed using TDD from day 1)
Running at scale on production
No deployment necessary
Both back-end and front-end experiment
Flexible architecture
Why Petri