causalimpact - tests using the r package

17
- Christophe Moinet - [email protected] 1

Upload: christophe-moinet

Post on 21-Jan-2018

967 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: CausalImpact - tests using the R package

Introduction

Data

Choice of control stores

Run of CausalImpact

Stability of the control stores

Conclusion

- Christophe Moinet - [email protected] 2

Page 3: CausalImpact - tests using the R package

CausalImpact is a new open-source R packagefor estimating causal effects in time series, usinga Bayesian model (Bayesian Structural Time Seriesmethods). It is developed by Google teams.

This test with CausalIimpact uses daily storesales with a causal event on several periods.

The objective of this test is to compute theimpact of this causal using◦ a set of stores impacted by this causal (test stores)◦ a set of stores not impacted (control stores)

Data used for this test are simulated data, theyare not real data.

This document describes this test.

- Christophe Moinet - [email protected]

Page 4: CausalImpact - tests using the R package

Dimensions :

◦ Stores : around 300

◦ Days of the year : 365

◦ Causal : 1

◦ Product line : 1

Facts :

◦ Sales, by Store, Day, for this product line

◦ Causal event (Yes/No) by Store, day for this product line

◦ Reminder : Data used in this test have been simulated and are not real.

- Christophe Moinet - [email protected]

4

Page 5: CausalImpact - tests using the R package

Data sales are averaged by day for building :◦ A control set of stores which are not impacted by

the studied causal◦ A test set of stores which are impacted by the

causal.

Statistics :◦ 50 stores are impacted by the studied causal from

September to December (test stores)◦ 250 stores are not impacted from January to

December (control stores)

Sales :

- Christophe Moinet - [email protected]

5

Stores Pre.period Post.period

Test 92 126

Control 93.5 105.5

Average Sales

Page 6: CausalImpact - tests using the R package

It is always a tricky part.

=> But it is essential for the impactcalculation

Issues appear with the first set of control Iprepared : Because their sales trend is too different from the

sales trend of test stores.

These issues are detected using CausalImpact.

- Christophe Moinet - [email protected]

Page 7: CausalImpact - tests using the R package

Pre.period (pre interventionperiod) is the periodwithout causal effect.Post.period is the periodwith causal effect.

Differences during thepre.period, between actualand prediction, are notrandomly distributed aroundthe null horizontal axe.

This shows an issue due tocontrol data.

- Christophe Moinet - [email protected]

7

Page 8: CausalImpact - tests using the R package

This issue is more visible ifwe try to predict salesduring the second part ofthe pre.period (period 151to 250)

A negative effect is shown

=> Data need to be purified.

This validation usingCausalImpact is reallyhelpful to validate controlstores.

- Christophe Moinet - [email protected] 8

Cumulative effect

Difference between actualand estimated sales

Page 9: CausalImpact - tests using the R package

To correct this issue, Isimulate a better set ofcontrol stores.

Now, no impacts appear onperiods 151 to 250.

Those control stores have abetter trend compared to teststores.

The past issues are fixed.

This way of doing could beused for real projects tovalidate the pre.period data.

- Christophe Moinet - [email protected] 9

Difference between actualand estimated sales

Original sales

Cumulative effect

Page 10: CausalImpact - tests using the R package

CausalImpact shows a significant impact on

sales :

relative effect = 20%

- Christophe Moinet - [email protected]

Stores Pre.period Post.period

Test 92 126

Control 93.5 105.5

Average SalesAdditionalBasic statistics :

Original sales

Difference between actualand estimated sales

Cumulative effect

Page 11: CausalImpact - tests using the R package

- Christophe Moinet - [email protected]

11

Page 12: CausalImpact - tests using the R package

◦ I developed a R process

◦ simulating the value of the causal impact for 500samples of control stores

◦ Each sample having 80% of total control stores,selected randomly.

◦ The algorithm is shown on next slide.

- Christophe Moinet - [email protected]

12

Page 13: CausalImpact - tests using the R package

Selection of the control stores

Selection of the test stores

Random selection of 80% of control stores

Averaging by period (day)

Averaging by period (day)

CausalImpactLoops 1 to 500

Total set of stores

Set of 500 causal impact computations13

Page 14: CausalImpact - tests using the R package

As a result :

- We get 500 different values of thepercent impact.

- We can show the distribution of theseimpacts

- 20% is the value of the impact from thewhole control sample (see slide 10 ).It’s correct compared to thisdistribution of impacts.

- The variation of impacts is quite low :Good for the stability of the impactvalue.

- But could be different with real cases.- RelEffect is the causal

impact (in %)- Frequency is the

number of simulations

- Christophe Moinet - [email protected] 14

Page 15: CausalImpact - tests using the R package

This package is really great :

◦ The general methodology can be understood andexplained easily. However the statistical methodologyis a bit more difficult.

◦ Output charts are really easy to use and understand

◦ The detailed report is helpful for a non statistician.

◦ Helpful for validating any set of control stores

◦ Possibility to go more deeper than the standard use.

◦ Easy to use as a step of a R process : can be used in aloop to test different stores, different clusters,different causals.

- Christophe Moinet [email protected]

15

Page 16: CausalImpact - tests using the R package

Results must be validated (significance of the effect)A good way to validate the results is to use crossvalidation.

And :◦ The difference between actual and estimated values

must be checked for the pre intervention period asshown in this document.

◦ Any issue with this validation needs a datavalidation. Control stores might be biased.

- Christophe Moinet [email protected] 16

Page 17: CausalImpact - tests using the R package

17

Thank you

Feel free to contact me:

Christophe Moinet [email protected]+33 6 58 00 33 36