customer-base analysis using aggregated data ( or: the joys of rcss )

Kinshuk Jerath, Carnegie Mellon UniversityPeter S. Fader, Wharton/Univ. of PennBruce G. S. Hardie, London Business School

Customer-Base Analysis Using Aggregated Data (Or: The Joys of RCSS)

2

Customer-Base AnalysisFaced with a customer transaction database, we may

wish to determine

The level of transactions we expect in future periods, both collectively and individually

Key characteristics of the cohort (e.g., degree of heterogeneity in behavior)

Formal financial metrics (such as “customer lifetime value”) to guide resource allocation decisions

3

Typical Data Structure

Models for customer-base analysis typically require access to individual-customer-level data

4

Long-Standing IT Challenges

5

Too-Much-Data Problem

6

Data Privacy Issues

7

Barriers to Disaggregate DataMany firms may not (be able to) keep detailed individual-

level records: General weaknesses with the firm’s information

systems capabilities Corporate information silos make data integration

difficult Wariness given high-profile stories on data loss Data protection laws (with bans on trans-border data

flows)

“Anonymizing” (and other statistical disclosure control methods) costly and potentially ineffective

8

Key Challenges

What data formats are easy to create/maintain privacy preserving

Can we adapt our “tried and true” models to accommodate these data limitations but still work well?

How much do we lose in the process?

9

Repeated Cross-Sectional Summary Data

10

Proof of Concept: Tuscan Lifestyles

11

Tuscan Lifestyles Data

12

How would we proceed if we had disaggregate data?

13

“Buy Till You Die” Model

Transaction Process (“Buy”) While “alive”, a customer purchases randomly around his

mean transaction rate Transaction rates vary across customers

Dropout Process (“Till You Die”) Each customer has an unobserved “lifetime” Dropout rates vary across customers

14

“Shop Till You Drop”

15

The Pareto/NBD Model(Schmittlein, Morrison, and Colombo 1987)

Transaction Process: While active, number of transactions made by a customer

follows a Poisson process with transaction rate λ

Transaction rates are distributed gamma(r,α) across the population

Dropout Process: Each customer has an unobserved lifetime of length τ,

which is distributed exponential with dropout rate μ

Dropout rates are distributed gamma(s,β) across the population

Astonishingly good fit and predictive performance

16

The Pareto/NBD works very well…

…given individual-level (disaggregate) data.

18

Pareto/NBD using RCSS data

Same assumptions as for the usual Pareto/NBD implementation

Calculate purchase probabilities over discrete intervals: P(X(t, t +1)) = x, P(X(t +1, t +2)) = x, P(X(t +2, t +3)) = x, etc.

Apply to RCSS histograms and use standard MLE estimation

Parameter estimation is fast, stable, and robust All of the usual Pareto/NBD diagnostics (e.g.,

“P(Alive)”) can be obtained from the parameter estimates

19

Model Fit

20

Do We Need All Five Years of Data?

Calibrate the model on years 1-3 only, predict for years 4 and 5.

21

Customer-Base Analysis Using Repeated Cross-Sectional Summary (RCSS) DataUnder more general conditions, what is the

“information loss” by aggregating data?

Under what conditions can a model built using aggregated data accurately mimic its individual-level counterpart?

How much aggregated data is required to do this job well?

22

Reminder – RCSS Data

23

Research Design Manipulate the four parameters of the Pareto/NBD

r, s = 0.5, 1.0, 1.5 α, β = 5, 10, 15We have 34 = 81 “worlds”

For each “world,” simulate 104 weeks of data for five synthetic panels of 2500 customers (first 78 weeks for calibration, last 26 weeks for holdout)

Fit the Pareto/NBD model to the raw transaction data – obtain disaggregate LL and parameters

“Backward-looking” (“Chopping it up”) analysis

“Forward-looking” (“Build as you go”) analysis

24

“Backward-Looking” AnalysisHow many cross-sectional summaries should be created? (How to “chop it up?”)

One 78-week histogram? Two 39-week histograms? Three 26-week histograms? … Six 13-week histograms?

For each of the six aggregation conditions, fit the Pareto/NBD to the resulting RCSS data, and:

1. Compare RCSS parameter estimates to the disaggregate benchmarks

2. Evaluate the disaggregate LL functions using the RCSS parameter estimates and compare to the disaggregate benchmark LL

3. Evaluate the fit of the predicted histograms from RCSS and disaggregate parameter estimates to the actual holdout histograms

25

Scenario 1: r = 0.5, α = 5, s = 0.5, β = 5

# Hist. Avg. LL Dev. RMSE r α s β

1

-23452.

4 3.1% 37.1 0.37 3.83 1.65 40.01

2

-22813.

3 0.3% 5.4 0.40 4.63 0.65 11.65

3

-22759.

0 0.0% 5.0 0.41 4.48 0.57 7.89

4

-22759.

4 0.0% 4.9 0.41 4.56 0.58 8.54

5

-22767.

8 0.1% 5.0 0.41 4.58 0.56 8.18

6

-22754.

9 0.0% 5.0 0.46 4.79 0.50 5.32

Disagg.

-22748.

1 5.7 0.44 4.85 0.56 7.35

26

“Forward-Looking” AnalysisHow many quarterly (13-week) histograms are required?(How many to “build as you go?”)

One (total 13 weeks)? Two (total 26 weeks)? Three (total 39 weeks)? … Six (total 78 weeks)?

For each of the six “number of histogram” conditions, fit the Pareto/NBD to the resulting RCSS data, and:

1. Compare RCSS parameter estimates to the disaggregate benchmarks

2. Evaluate the disaggregate LL functions on the full data using the RCSS parameter estimates and compare to the disaggregate benchmark LL

3. Evaluate the fit of the predicted histograms from RCSS and disaggregate parameter estimates to the actual holdout histograms

27

Scenario 1: r = 0.5, α = 5, s = 0.5, β = 5

# Qtrs. Avg. LL Dev. RMSE r α s β

1

-23411.

8 2.9% 167.1 0.45 4.20 3.28 40.01

2

-22761.

9 0.1% 19.5 0.49 4.91 0.37 2.67

3

-22756.

5 0.0% 17.0 0.49 4.88 0.44 3.49

4

-22749.

8 0.0% 7.2 0.45 4.75 0.49 5.22

5

-22750.

2 0.0% 4.8 0.46 4.83 0.49 4.93

6

-22750.

1 0.0% 5.0 0.46 4.79 0.50 5.32

Disagg.

-22748.

1 5.7 0.44 4.85 0.56 7.37

28

Summary of Results

Using three or more quarters always provides the same performance as disaggregate data in terms of:

Parameter recovery

In-sample LL

Out-of-sample predictions

29

Conclusions

We can estimate the Pareto/NBD using RCSS data; the findings from the Tuscan Lifestyles study are generalizable

Useful/interesting model diagnostics still emerge – even in the absence of any individual-level data

Three cross-sections are generally sufficient

30

Other Desirable Properties Just the percentage of total customers in each bucket is

sufficient – don’t even need actual numbers

Data can be “aperiodic” (they just have to be “repeated”)

Histograms can be of different time lengths, e.g., 3-month + 6-month + 4-month

Histograms can be missing, e.g., Qtr. 1, –, Qtr. 3, Qtr. 4

Data management/storage benefits

31

What Would Managers (and Customers) Rather Use?

or

customer-base analysis using aggregated data ( or: the joys of rcss )

Documents

level data

aggregated data

years of data

summary data

data formats

data problem

aggregating data

data limitations