how to conclude online experiments in python
TRANSCRIPT
![Page 1: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/1.jpg)
volodymyrk
How to conclude online experiments
in PythonVolodymyr (Vlad) Kazantsev
Head of Data Science at Product Madness
![Page 3: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/3.jpg)
volodymyrk
Goal of the tutorial
Uncover the “magic” behind statistics used for A/B testing and other online experiments
![Page 4: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/4.jpg)
volodymyrk
● Head of Data Science (Social Gaming)
● Product Manager at King
● MBA at London Business School
● Visual Effect developer (Avatar, Batman, ...)
● MSc in Probability (Kiev Uni, Ukraine)
A quick bioNow
2004
![Page 5: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/5.jpg)
volodymyrk
Different kinds of tests
● Classic A/B tests
● Long running activities with control groups
● Longitudinal tests
![Page 6: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/6.jpg)
volodymyrk
Why bother?
● To test your hypothesis and learn● To avoid blindly following HiPPOs● To audit performance of product and
marketing teams
![Page 7: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/7.jpg)
volodymyrk
Why Stats?
● To separate data from the noise● To quantify uncertainty
![Page 8: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/8.jpg)
volodymyrk
Fruit Crush Epic
The Story of almost real mobile game, in the almost real gaming company.. and one Data Scientist
![Page 9: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/9.jpg)
volodymyrk
Day-13 seconds panic-attack
![Page 10: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/10.jpg)
volodymyrk
Day 1 - loading time panic-attack!Fruit Crush Epic
![Page 11: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/11.jpg)
volodymyrkTaxonomy of Classical stat testing
Which Test?
1 Sample
2 Samples
>2 Samples
Mean
Proportion
Variance
σ known
σ unknown
z-test one sample
t-test one sample
z-test for proportion
Chi-squared test
Mean
Proportion
Variance
ANOVA
z-test for (μ1-μ2)
t-test for (μ1-μ2)
z-test or t-test for dependent samples
z-test, 2 proportions
independent
dependent samples
σ1,σ2 known
σ1,σ2 unknown
F-test
![Page 12: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/12.jpg)
volodymyrkTaxonomy of Classical stat testing
Which Test?
1 Sample
2 Samples
>2 Samples
Mean
Proportion
Variance
σ known
σ unknown
z-test one sample
t-test one sample
z-test for proportion
Chi-squared test
Mean
Proportion
Variance
ANOVA
z-test for (μ1-μ2)
t-test for (μ1-μ2)
z-test or t-test for dependent samples
z-test, 2 proportions
independent
dependent samples
σ1,σ2 known
σ1,σ2 unknown
F-test
![Page 13: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/13.jpg)
volodymyrk
One sample t-testNull Hypothesis:- avg. loading time <=3 seconds for last hour's observation
Alternative Hypothesis:- population mean is >3 seconds for last hour's observation
Test:- single sample, one-sided t-test.
![Page 14: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/14.jpg)
volodymyrk
One sample t-test
t_value = t-test(samples, expected mean)
p-value: 0.086 probability of obtaining the result as extreme as observed, assuming Null-hypothesis is true
t-distribution lookup(t_value, sample_size)
![Page 15: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/15.jpg)
volodymyrk
If you want to code it yourself
![Page 16: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/16.jpg)
volodymyrk
Stats in Python
numpy
scipy.stats
statsmodels.stats
theano
pymc3
Classical Bayesian
* High-level view. Lot’s of stuff missing here. pymc3 uses statsmodels for GLM
![Page 17: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/17.jpg)
volodymyrk
One sample t-test and z-test
![Page 18: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/18.jpg)
volodymyrk
Confidence Interval
![Page 19: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/19.jpg)
volodymyrk
Confidence Interval for the Mean
![Page 20: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/20.jpg)
volodymyrk
Standard Error of the Mean in Python
![Page 21: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/21.jpg)
volodymyrk
Next Day
![Page 22: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/22.jpg)
volodymyrk
Day-2OMG, my Retention is low!
![Page 23: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/23.jpg)
volodymyrk
Is my day-1 retention low?
Day-1 results:
installs 448
returned next day 123
Day-1 retention 27.46%
Retention target 30%
Fruit Crush Epic
![Page 24: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/24.jpg)
volodymyrkTaxonomy of Classical stat testing
Which Test?
1 Sample
2 Samples
>2 Samples
Mean
Proportion
Variance
σ known
σ unknown
z-test one sample
t-test one sample
z-test for proportion
Chi-squared test
Mean
Proportion
Variance
ANOVA
z-test for (μ1-μ2)
t-test for (μ1-μ2)
z-test or t-test for dependent samples
z-test, 2 proportions
independent
dependent samples
σ1,σ2 known
σ1,σ2 unknown
F-test
![Page 25: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/25.jpg)
volodymyrk
One sample z-test for proportionNull Hypothesis:- avg. retention >=30%
Alternative Hypothesis:- avg. retention <30%
Test:- single sample, one-sided z-test for proportion
![Page 26: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/26.jpg)
volodymyrk
In Python...
![Page 27: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/27.jpg)
volodymyrk
So what is my confidence interval?
![Page 28: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/28.jpg)
volodymyrk
Day-5Connect with Facebook or Die!
The First A/B test
![Page 29: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/29.jpg)
volodymyrk
A/B test 1 - connect to Facebook
![Page 30: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/30.jpg)
volodymyrk
A/B test design
Group A
Group B Start Level 1
Start Level 1
Finish Level 1
50%
50%
Have seen prompt 2501
Connected 1104
Connect rate 44.1%
Have seen prompt 2141
Connected 1076
Connect rate 50.2%
Fruit Crush Epic
![Page 31: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/31.jpg)
volodymyrk
Is it statistically significant?Fruit Crush Epic
![Page 32: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/32.jpg)
volodymyrkTaxonomy of Classical stat testing
Which Test?
1 Sample
2 Samples
>2 Samples
Mean
Proportion
Variance
σ known
σ unknown
z-test one sample
t-test one sample
z-test for proportion
Chi-squared test
Mean
Proportion
Variance
ANOVA
z-test for (μ1-μ2)
t-test for (μ1-μ2)
z-test or t-test for dependent samples
z-test, 2 proportions
independent
dependent samples
σ1,σ2 known
σ1,σ2 unknown
F-test
![Page 33: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/33.jpg)
volodymyrk
Two samples z-test for proportionNull Hypothesis:- avg. connection rate is the same. P1 = P2
Alternative Hypothesis:- P1 ≠ P2
Test:- two samples z-test for proportion. Two sided
![Page 34: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/34.jpg)
volodymyrk
Two samples z-test for proportion in Python
![Page 35: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/35.jpg)
volodymyrk
Confidence interval for difference in proportion
![Page 36: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/36.jpg)
volodymyrk
In Python
![Page 37: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/37.jpg)
volodymyrk
What should we measure, exactly?
1000
1000
150
400
450
30
390
430
160
840
40
400
400
connected: 47%retained: 82%
connected: 50%retained: 80%Start
Level 1
Start Level 1
Start Level 2
Start Level 2
![Page 38: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/38.jpg)
volodymyrk
What about Bayesian Stats?
![Page 39: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/39.jpg)
volodymyrk
Bayesian Credible Interval vs. CI
![Page 40: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/40.jpg)
volodymyrk
Day-30Do you want to buy last chance?
A/B testing Revenue
![Page 41: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/41.jpg)
volodymyrk
How much an extra life is worth?
LOSER!!!
Purchase another chance
for only..
$0.99
LOSER!!!
Purchase another chance
for only..
$1.99
Fruit Crush Epic
![Page 42: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/42.jpg)
volodymyrk
How we are going to test it?Consider● There are multiple items to buy in game (lives, boosters, blenders, etc)● We expect more people to make a $0.99 purchase, so we hope to make
more money overall, even at lower priceA/B test Design● We will show A/B test to new users only● Will run for 2 months● We will measure overall revenue per user in the first 30 days● Null-hypothesis: we make more money from $0.99 group
Measurements● Difference in Average Revenue Per User (ARPU) in 30 days● Difference in Conversion Rate (%% of users who make at least 1 purchase)
![Page 43: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/43.jpg)
volodymyrk
Results
count 450 390mean 151.9 214.225% 20.8 26.550% 55.3 69.475% 147.3 231.3max 3960 3647.8
Fruit Crush Epic
* random generator used in the example is available in ipython notebooks** distribution is made more extreme than what is normally observed in casual game, like our imaginary match-3 title
![Page 44: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/44.jpg)
volodymyrk
Results
30,000 users in each group450 payers 390 payers
p-value = 0.037Significant
p-value = ???Is it Significant?
![Page 45: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/45.jpg)
volodymyrkTaxonomy of Classical stat testing
Which Test?
1 Sample
2 Samples
>2 Samples
Mean
Proportion
Variance
σ known
σ unknown
z-test one sample
t-test one sample
z-test for proportion
Chi-squared test
Mean
Proportion
Variance
ANOVA
z-test for (μ1-μ2)
t-test for (μ1-μ2)
z-test or t-test for dependent samples
z-test, 2 proportions
independent
dependent samples
σ1,σ2 known
σ1,σ2 unknown
F-test
![Page 46: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/46.jpg)
volodymyrk
Welch's t-test (σ1≠σ2)
Can we actually use t-test?
![Page 47: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/47.jpg)
volodymyrk
Poor’s man non-parametric test: split 5
p < 3%
![Page 48: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/48.jpg)
volodymyrkIf you don’t know enough stats - simulate!
This is very close to p-value from t-test
![Page 49: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/49.jpg)
volodymyrk
Can we improve sensitivity?27 players, who have spent > $1000 in both group.10 in $0.99 group and 17 in $1.99 groupMax spent = $3960
![Page 50: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/50.jpg)
volodymyrkAnd we re-run our analysis
Again, we can use t-test
![Page 51: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/51.jpg)
volodymyrk
Final Thoughts
![Page 52: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/52.jpg)
volodymyrk
Can we analyse distributions?
You can quantify difference between two curvesArea under the curve is Average Revenue per User
Fruit Crush Epic
* random generator used in the example is available in ipython notebooks** distribution is made more extreme than what is normally observed in casual game, like our imaginary match-3 title
![Page 53: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/53.jpg)
volodymyrk
Is 30 day revenue a good metric?LTV projection A LTV projection B
Fruit Crush Epic
![Page 54: How to conclude online experiments in python](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c4948cbb61ebec0c8b482a/html5/thumbnails/54.jpg)
volodymyrk
Summary:
● There are only few stats tests that any Data Scientist must know
● t-tests are robust to be useful even with skewed data sets
● Bayesian and MCMC is cool, but don’t use MCMC for trivial cases
● It is hard to detect the difference in heavily-skewed cases
IPython Notebooks for this tutorial are available at: http://nbviewer.ipython.org/github/VolodymyrK/stats-testing-in-python