Download - Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha
![Page 1: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/1.jpg)
![Page 2: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/2.jpg)
I listen to ~ 100 Bln ad opportunities daily
I respond with optimal bids within milliseconds
I petabytes of data (ad impressions, visits, clicks, conversions)
![Page 3: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/3.jpg)
I listen to ~ 100 Bln ad opportunities daily
I respond with optimal bids within milliseconds
I petabytes of data (ad impressions, visits, clicks, conversions)
![Page 4: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/4.jpg)
I listen to ~ 100 Bln ad opportunities daily
I respond with optimal bids within milliseconds
I petabytes of data (ad impressions, visits, clicks, conversions)
![Page 5: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/5.jpg)
I listen to ~ 100 Bln ad opportunities daily
I respond with optimal bids within milliseconds
I petabytes of data (ad impressions, visits, clicks, conversions)
![Page 6: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/6.jpg)
Predicting user response to ads is a Machine-Learning problem.
but quantifying impact of ad-exposure is a Measurement probem.
![Page 7: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/7.jpg)
Predicting user response to ads is a Machine-Learning problem.but quantifying impact of ad-exposure is a Measurement probem.
![Page 8: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/8.jpg)
Spark: existing vs simulated data
Most Spark applications process existing big data-sets.
Today we’re talking about analyzing simulated big data
![Page 9: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/9.jpg)
Spark: existing vs simulated data
Most Spark applications process existing big data-sets.Today we’re talking about analyzing simulated big data
![Page 10: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/10.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurement
I Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 11: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/11.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definition
I Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 12: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/12.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 13: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/13.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence Bounds
I Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 14: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/14.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 15: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/15.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:
I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 16: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/16.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:I Monte Carlo sampling for confidence-bounds
I Monte Carlo simulations
![Page 17: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/17.jpg)
Key Conceptual Take-aways
I Issues in Ad lift measurementI Proper definitionI Confidence bounds
I Bayesian Methods for Ad Lift Confidence BoundsI Gibbs Sampling (MCMC – Markov Chain Monte Carlo)
I Using Spark for:I Monte Carlo sampling for confidence-boundsI Monte Carlo simulations
![Page 18: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/18.jpg)
Application context: ad impact measurement
I Advertisers want to know the impact of showing ads to users.
![Page 19: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/19.jpg)
Measuring Ad Impact: Two Approaches
I Observational studies:
I Compare uses who happen to be exposed vs not exposedI Bias a big issue
I Randomized tests:
I Randomly expose to test, compare with control (un-exposed)
![Page 20: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/20.jpg)
Measuring Ad Impact: Two Approaches
I Observational studies:I Compare uses who happen to be exposed vs not exposed
I Bias a big issue
I Randomized tests:
I Randomly expose to test, compare with control (un-exposed)
![Page 21: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/21.jpg)
Measuring Ad Impact: Two Approaches
I Observational studies:I Compare uses who happen to be exposed vs not exposedI Bias a big issue
I Randomized tests:
I Randomly expose to test, compare with control (un-exposed)
![Page 22: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/22.jpg)
Measuring Ad Impact: Two Approaches
I Observational studies:I Compare uses who happen to be exposed vs not exposedI Bias a big issue
I Randomized tests:
I Randomly expose to test, compare with control (un-exposed)
![Page 23: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/23.jpg)
Measuring Ad Impact: Two Approaches
I Observational studies:I Compare uses who happen to be exposed vs not exposedI Bias a big issue
I Randomized tests:I Randomly expose to test, compare with control (un-exposed)
![Page 24: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/24.jpg)
Ideal Randomized Test
![Page 25: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/25.jpg)
Ideal Randomized Test
![Page 26: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/26.jpg)
Ideal Randomized Test
![Page 27: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/27.jpg)
Ideal Randomized Test: Ad lift
![Page 28: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/28.jpg)
Ideal Randomized Test: Ad lift
![Page 29: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/29.jpg)
Ad Lift: Response Rates
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .But how confident are we?
![Page 30: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/30.jpg)
Ad Lift: Response Rates
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .
But how confident are we?
![Page 31: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/31.jpg)
Ad Lift: Response Rates
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .But how confident are we?
![Page 32: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/32.jpg)
Response Rate 90% Confidence Bounds
P(R > R̂ | r = q5) = 5%P(R < R̂ | r = q95) = 5%
![Page 33: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/33.jpg)
Response Rate 90% Confidence Bounds
P(R > R̂ | r = q5) = 5%
P(R < R̂ | r = q95) = 5%
![Page 34: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/34.jpg)
Response Rate 90% Confidence Bounds
P(R > R̂ | r = q5) = 5%P(R < R̂ | r = q95) = 5%
![Page 35: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/35.jpg)
Response-Rate Confidence Bounds
![Page 36: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/36.jpg)
Response-Rate Confidence Bounds
![Page 37: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/37.jpg)
Response-Rate Confidence Bounds
![Page 38: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/38.jpg)
Response-Rate Confidence Bounds
How to find (q5, q95) ?
![Page 39: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/39.jpg)
![Page 40: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/40.jpg)
Response-Rate: Bayesian Confidence Bounds
Randomly generate response rates that are consistent with the data.
(Sample rates from posterior distribution given data.)Find the (0.05, 0.95) quantiles of these rates.
![Page 41: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/41.jpg)
Response-Rate: Bayesian Confidence Bounds
Randomly generate response rates that are consistent with the data.(Sample rates from posterior distribution given data.)
Find the (0.05, 0.95) quantiles of these rates.
![Page 42: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/42.jpg)
Response-Rate: Bayesian Confidence Bounds
Randomly generate response rates that are consistent with the data.(Sample rates from posterior distribution given data.)Find the (0.05, 0.95) quantiles of these rates.
![Page 43: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/43.jpg)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
à r
k(1 ≠ r)N≠k · Beta(1, 1)Ã r
k+1(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
![Page 44: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/44.jpg)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)
à r
k(1 ≠ r)N≠k · Beta(1, 1)Ã r
k+1(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
![Page 45: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/45.jpg)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)Ã r
k(1 ≠ r)N≠k · Beta(1, 1)
à r
k+1(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
![Page 46: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/46.jpg)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)Ã r
k(1 ≠ r)N≠k · Beta(1, 1)Ã r
k+1(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
![Page 47: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/47.jpg)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)Ã r
k(1 ≠ r)N≠k · Beta(1, 1)Ã r
k+1(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
![Page 48: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/48.jpg)
Response-Rate: Bayesian Confidence Bounds
I Assume an unknown true rate r , with a prior distrib. p(r)I assume p(r) = Beta(1, 1) = Unif (0, 1)
I Sample from the posterior distribution of the rate r
I conditional on the observed data (k conversions out of N)
P(r | k) Ã P(k | r) · p(r)Ã r
k(1 ≠ r)N≠k · Beta(1, 1)Ã r
k+1(1 ≠ r)N≠k+1
à Beta(k + 1, N ≠ k + 1)
I Compute (0.05, 0.95) quantiles from the generated rates.
![Page 49: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/49.jpg)
Response-Rate: Bayesian Confidence Bounds
A simple form of Gibbs Sampling (more later):
I sample M values of r from posteriorP(r | k) ≥ Beta(k + 1, N ≠ k + 1).
I compute (0.05, 0.95) quantiles
from numpy.random import beta
from scipy.stats.mstats import mquantiles
def conf(N, k, samples = 500):
rates = beta(k+1, N-k+1, samples)
return mquantiles(rates, prob = [0.05, 0.95])
![Page 50: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/50.jpg)
Response-Rate: Bayesian Confidence Bounds
A simple form of Gibbs Sampling (more later):
I sample M values of r from posteriorP(r | k) ≥ Beta(k + 1, N ≠ k + 1).
I compute (0.05, 0.95) quantiles
from numpy.random import beta
from scipy.stats.mstats import mquantiles
def conf(N, k, samples = 500):
rates = beta(k+1, N-k+1, samples)
return mquantiles(rates, prob = [0.05, 0.95])
![Page 51: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/51.jpg)
Response-Rate: Bayesian Confidence Bounds
![Page 52: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/52.jpg)
Response-Rate: Bayesian Confidence Bounds
![Page 53: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/53.jpg)
Response-Rate: Bayesian Confidence Bounds
![Page 54: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/54.jpg)
Response-Rate: Bayesian Confidence Bounds
![Page 55: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/55.jpg)
Response-Rate: Bayesian Confidence Bounds
![Page 56: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/56.jpg)
Response Rates: Example
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .
=∆ 90% confidence region (1.8%, 2.2%)
![Page 57: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/57.jpg)
Response Rates: Example
If we see k = 200 conversions out of N = 10, 000 users,
what is a good estimate for the response-rate?
Estimated response-rate R̂ = k/N = 200/10, 000 = 2%. . .=∆ 90% confidence region (1.8%, 2.2%)
![Page 58: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/58.jpg)
We’ve talked about Response Rates. . .
now let’s consider Ad Lift
![Page 59: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/59.jpg)
Ad Lift: Simple Example
I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions
Observed response-rates:
I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%
Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%
This is a great lift !Not so fast! Is this a reliable estimate?Could true lift ¸ be 0%, or even negative ?
![Page 60: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/60.jpg)
Ad Lift: Simple Example
I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions
Observed response-rates:
I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%
Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%This is a great lift !
Not so fast! Is this a reliable estimate?Could true lift ¸ be 0%, or even negative ?
![Page 61: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/61.jpg)
Ad Lift: Simple Example
I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions
Observed response-rates:
I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%
Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%This is a great lift !Not so fast! Is this a reliable estimate?
Could true lift ¸ be 0%, or even negative ?
![Page 62: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/62.jpg)
Ad Lift: Simple Example
I control: 10,000 users, 200 conversionsI test: 100,000 users, 2200 conversions
Observed response-rates:
I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%
Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%This is a great lift !Not so fast! Is this a reliable estimate?Could true lift ¸ be 0%, or even negative ?
![Page 63: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/63.jpg)
Ad Lift: Bayesian Confidence Bounds
Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
![Page 64: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/64.jpg)
Ad Lift: Bayesian Confidence Bounds
Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
![Page 65: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/65.jpg)
Ad Lift: Bayesian Confidence Bounds
Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
![Page 66: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/66.jpg)
Ad Lift: Bayesian Confidence Bounds
Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
![Page 67: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/67.jpg)
Ad Lift: Bayesian Confidence Bounds
Sampling approach:Observed data: control: (kc , Nc), test: (kt , Nt)
1. Repeat M times:
I draw control response rate rc from posterior
P(rc | kc) ≥ Beta(kc + 1, Nc ≠ kc + 1).
I draw test response rate rt from posterior
P(rt | kt) ≥ Beta(kt + 1, Nt ≠ kt + 1).
I compute lift L = rt/rc ≠ 1
2. Compute (0.05, 0.95) quantiles of set of M lifts {L}.
![Page 68: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/68.jpg)
Ad Lift: Bayesian Confidence Intervals
I control: nc = 10, 000 users, kc = 200 conversionsI test: nt = 100, 000 users, kt = 2, 200 conversions
Observed response-rates:
I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%
Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%
90% confidence interval: (≠2.7%, 23.6%)
![Page 69: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/69.jpg)
Ad Lift: Bayesian Confidence Intervals
I control: nc = 10, 000 users, kc = 200 conversionsI test: nt = 100, 000 users, kt = 2, 200 conversions
Observed response-rates:
I control: R̂c = 200/10, 000 = 2%I test: R̂t = 2200/100, 000 = 2.2%
Estimated Lift L̂ = 2.2/2 ≠ 1 = 10%90% confidence interval: (≠2.7%, 23.6%)
![Page 70: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/70.jpg)
![Page 71: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/71.jpg)
Complication 1:
Auction win-bias
![Page 72: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/72.jpg)
Ideal Randomized Test
![Page 73: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/73.jpg)
Ideal Randomized Test
![Page 74: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/74.jpg)
Ideal Randomized Test
Bids on control users are wasted!
![Page 75: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/75.jpg)
Ideal Randomized Test
Bids on control users are wasted!
![Page 76: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/76.jpg)
![Page 77: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/77.jpg)
A Less Wasteful Randomized Test
![Page 78: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/78.jpg)
A Less Wasteful Randomized Test: Win-bias
Cannot simply compare Test Winners (tw) and Control (c):
I test-winners selection bias: “win bias”
![Page 79: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/79.jpg)
Ad Lift: Proper Definition
![Page 80: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/80.jpg)
Ad Lift: Proper Definition
![Page 81: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/81.jpg)
Ad Lift: Proper Definition
![Page 82: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/82.jpg)
Ad Lift: Proper Definition
![Page 83: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/83.jpg)
Ad Lift: Proper Definition
![Page 84: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/84.jpg)
Ad Lift: Proper Definition
![Page 85: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/85.jpg)
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R
0tw = Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R
1tw /R
0tw ≠ 1
I similar to Treatment E�ect Under Non-compliance in clinicialtrials.
![Page 86: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/86.jpg)
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R
0tw = Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R
1tw /R
0tw ≠ 1
I similar to Treatment E�ect Under Non-compliance in clinicialtrials.
![Page 87: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/87.jpg)
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R
0tw = Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R
1tw /R
0tw ≠ 1
I similar to Treatment E�ect Under Non-compliance in clinicialtrials.
![Page 88: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/88.jpg)
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R
0tw = Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R
1tw /R
0tw ≠ 1
I similar to Treatment E�ect Under Non-compliance in clinicialtrials.
![Page 89: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/89.jpg)
Ad Lift Estimation
Main ideas:
I observe test-losers response rate RtL
I observe test win-rate w
I we show one can estimate
R
0tw = Rc ≠ (1 ≠ w)RtL
w
I compute lift L = R
1tw /R
0tw ≠ 1
I similar to Treatment E�ect Under Non-compliance in clinicialtrials.
![Page 90: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/90.jpg)
Ad Lift Estimation
How to compute the 90% confidence interval for L?
![Page 91: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/91.jpg)
![Page 92: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/92.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 93: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/93.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:
I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 94: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/94.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviors
I their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 95: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/95.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 96: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/96.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 97: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/97.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 98: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/98.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 99: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/99.jpg)
Ad Lift: Confidence Intervals with Gibbs sampler
Bayesian approach (details omitted, see Chickering/Pearl 1997):
I Assume a random parameter vector ◊ consisting of:I user latent (potential) behaviorsI their probabilities
I Set up prior distribution on ◊ ≥ p(◊) (Dirichlet)
I Sample M values of unknown ◊ from posterior: Gibbs Sampler
P(◊ |Data) Ã P(Data | ◊) · p(◊)
I For each sampled ◊ compute lift L using above
I Compute (0.05, 0.95) quantiles of sampled L values
![Page 100: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/100.jpg)
Ad Lift: Confidence Intervals
Gibbs sampler convergence may depend on prior distribution:
I start with multiple (say 100) priorsI run them all in parallel using Spark.
![Page 101: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/101.jpg)
Ad Lift: Confidence Intervals
Gibbs sampler convergence may depend on prior distribution:
I start with multiple (say 100) priorsI run them all in parallel using Spark.
![Page 102: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/102.jpg)
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su�cient” population sizes for reliably estimating
I response ratesI lift
I understand e�ect of complex phenomena
I validate/verify analytical formulas
![Page 103: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/103.jpg)
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su�cient” population sizes for reliably estimating
I response ratesI lift
I understand e�ect of complex phenomena
I validate/verify analytical formulas
![Page 104: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/104.jpg)
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su�cient” population sizes for reliably estimatingI response rates
I lift
I understand e�ect of complex phenomena
I validate/verify analytical formulas
![Page 105: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/105.jpg)
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su�cient” population sizes for reliably estimatingI response ratesI lift
I understand e�ect of complex phenomena
I validate/verify analytical formulas
![Page 106: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/106.jpg)
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su�cient” population sizes for reliably estimatingI response ratesI lift
I understand e�ect of complex phenomena
I validate/verify analytical formulas
![Page 107: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/107.jpg)
Uses of Monte Carlo Simulations
I confidence intervals
I determine “su�cient” population sizes for reliably estimatingI response ratesI lift
I understand e�ect of complex phenomenaI validate/verify analytical formulas
![Page 108: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/108.jpg)
Complication 2:
Control contamination due to users with multiple cookies
![Page 109: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/109.jpg)
![Page 110: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/110.jpg)
Control Contamination due to Multiple Cookies
![Page 111: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/111.jpg)
Control Contamination due to Multiple Cookies
![Page 112: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/112.jpg)
Control Contamination due to Multiple Cookies
![Page 113: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/113.jpg)
Control Contamination due to Multiple Cookies
![Page 114: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/114.jpg)
Control Contamination due to Multiple Cookies
![Page 115: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/115.jpg)
Cookie-Contamination Questions
I How does cookie contamination a�ect measured lift?
I Does the cookie-distribution matter?
I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
I Simulations best way to understand this
![Page 116: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/116.jpg)
Cookie-Contamination Questions
I How does cookie contamination a�ect measured lift?
I Does the cookie-distribution matter?
I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
I Simulations best way to understand this
![Page 117: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/117.jpg)
Cookie-Contamination Questions
I How does cookie contamination a�ect measured lift?
I Does the cookie-distribution matter?I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
I Simulations best way to understand this
![Page 118: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/118.jpg)
Cookie-Contamination Questions
I How does cookie contamination a�ect measured lift?
I Does the cookie-distribution matter?I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
I Simulations best way to understand this
![Page 119: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/119.jpg)
Cookie-Contamination Questions
I How does cookie contamination a�ect measured lift?
I Does the cookie-distribution matter?I everyone has k cookies vs an average of k cookies
I What is the influence of the control percentage?
I Simulations best way to understand this
![Page 120: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/120.jpg)
Simulations for cookie-contamination
I A scenario is a combination of parameters:I
M = # trials for this scenario, usually 10K-1MI
n = # users, typically 10K - 10MI
p = # control percentage (usually 10-50%)I
k = cookie-distribution, expressed as 1 : 100, or 1 : 70, 3 : 30I
r = (un-contaminated) control user response rateI
a = true lift, i.e. exposed user response rate = r ú (1 + a).I A scenario file specifies a scenario in each row.
I could be thousands of scenarios
![Page 121: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/121.jpg)
Scenario Simulations in Spark
![Page 122: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/122.jpg)
Scenario Simulations in Spark
![Page 123: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/123.jpg)
Scenario Simulations in Spark
![Page 124: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/124.jpg)
Scenario Simulations in Spark
![Page 125: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/125.jpg)
Scenario Simulations in Spark
![Page 126: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/126.jpg)
![Page 127: Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasani and Ram Sriharsha](https://reader031.vdocuments.us/reader031/viewer/2022030317/587155591a28ab8e5b8b5087/html5/thumbnails/127.jpg)