showcase: on segmentation importance for marketing campaign in retail using r and h2o

40
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Showcase: on segmentation importance for marketing campaign in retail using R and H2O Wit Jakuczun, WLOG Solutions

Upload: wit-jakuczun

Post on 12-Apr-2017

173 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Showcase: on segmentation importance formarketing campaign in retail using R and H2O

Wit Jakuczun, WLOG Solutions

Page 2: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Introduction

Page 3: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Agenda

I “At the corner” business caseI What is segmentation?I How to build (optimal) segmentation models in H2O?I How to combine segmentation and predictive modelling?I Summary

Page 4: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Who am I?

I JobI owner of company WLOG Solutions

I EducationI Mathematician (Warsaw University)

I My expertise:I Solving business problems with analytical solutionsI Implementing and delivering optimization and predictive models

I Contact details:I email: [email protected] WWW: www.wlogsolutions.com

Page 5: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

If you want to follow me…I have used:

I R version 3.3.2I H2O version 3.10.0.8

Code available at WLOG’s github space

I Direct link

Page 6: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First step

I Download repository using link and extract into any folderI Open retail-segmentation-based-marketing-campaign-in-r-and-h2o.Rproj inRStudio

Page 7: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

The result

Page 8: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Packages installation

source("install_packages.R")

Important: all installation is done locally in libs folder. Your Renvironment is not messed up!

Page 9: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

At the corner’s business case

Page 10: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is At the corner?At the corner is an analytical driven retail chain selling awide variety of products.

Page 11: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is At the corner’s business challenge?At the corner would like to introduce a new product.

Page 12: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is their business approach?At the corner decided to go with an e-mail marketingcampaign. To optimize campaign costs and customers’comfort they decided to carefully select customers thatwould be contacted in the campaign.

Page 13: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What has already been done?

I Conducted a pilot campaign and gathered customersresponses

I Analytical table has been prepared

Page 14: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is in the analytical table?

data_train <- fread("data/retail_train.csv")colnames(data_train)

We have following categories of variables:

I marketing - did we contact a customer before?I purchased - did the customer purchase in a result of pilotcampaign

I demographic: sex, age, income,I behavioural - what is customer’s buying pattern?

I basket of productsI basket valueI purchases in a nearest shopI mean distance to shops

Page 15: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is our goal?Score customers in data/retail_test.csv.

Page 16: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Our approachWe will build three types of models:

I (M1) Logistic regression with variables from analytical table.I (M2) Logistic regression with variables from analytical table anda variable from a segmentation model based on behaviouralvariables.

I (M3) Local logistic regression models with variables fromanalytical table for segments calculated by segmentationmodel based on behavioural variables.

We will select the best one using AUC measure.

Page 17: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First model - no segmentationMost important parts:

I source("build_p2b_nosegmentation_model.R")I What is interesting?

I ?h2o.grid - model meta parameters fittingI find_best_model.R - find best model according to AUC measureI ?h2o.auc - internal H2O function for calculating AUCI pROC package

I ?pROC::roc - ROC curve calculationI ?pROC::auc - AUC measure for ROC curve calculation

Page 18: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

First model - results

I Our baseline is AUC = 0.6460I And what does Flow say?

I http://localhost:54321/flow/index.html

Can we do better with segmentation?

Page 19: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is segmentation?

Page 20: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is your definition?

Page 21: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Naive definitionUnsupervised approach for discovering groups of similarobjects according to some distance/similarity measure.

My (our) definitionDiscovering latent variables, that are strongly non-lineartransformations of the input space. The transformation,being based on metric on input space, are too difficult forstandard supervised algorithms to be discovered.

Page 22: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Why is segmentation difficult?

I Business perspectiveI It is almost impossible to formalize requirements for being goodsegmentation in general.

I But it is possible (next slides) to formalize requirement for beinggood segmentation in predictive modelling.

I Technical perspectiveI Final segments depends on both variables and the distance.I Number of segments is unknown and must be calculated fromdata or given by the oracle.

Page 23: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Things to be considered

I Popular algorithms (like kmeans) are randomizedI Repeat segmentation N times.I Select best segments using e.g. within sum of squares metric

I and iterativeI Give enough number of iterations to be sure the algorithm hasconverged.

I Sometimes segment centres cannot be a meanI Can use more expensive medoid approaches

Page 24: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

How to measure goodness of segmentation methods?

I A very informative method is silhouetteI Only useful if we have the same distance.

I For example choosing number of segments.

I But we are in predictive modellingI Use predictive power of the final models!

Page 25: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is good segmentation for predictive model?Good segmentation is a segmentation that significantlyimproves predictive model quality measure (e.g. AUC).

Page 26: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

How to build segmentation models in H2Oand R?

Page 27: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

What is available?

I H2O provides k-means algorithmI Tutorial is here

Page 28: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Let’s analyse the code (1)

I Check build_segmentation_models.RI For given range of segments cluster_cntsI Generate rounds segmentations and select the best one

Page 29: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Let’s analyse the code (2)Main part - fitting the model:

segmentation_model <- h2o.kmeans(training_frame = training_frame,x = segmentation_vars,k = cluster_cnt,model_id = sprintf("segmentation_model_%s", cluster_cnt),init = "PlusPlus",standardize = TRUE)

Page 30: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Let’s analyse the code (2)And scoring segmentation model (check filepredict_segmentation_models.R)

h2o.predict(segmentation_model, newdata = train_df)

Page 31: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

How to combine segmentation andpredictive modelling?

Page 32: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Two approaches

I Use segments assignment as another predictor.I Build local models for segments.

Page 33: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Segment as another predictor

I Check build_p2b_segmentation_model.RI Most important parts:

I Lines 45-49: building segmentation modelsI Lines 53-55: predict segmentation modelsI Lines 59-77: build predictive models with segments

I Lines 61-62: assign segments to customersI Lines 64-74: select best model for given number of segments

I Lines 81-93: select best model

Page 34: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Segment as another predictor - results

I Best number of segments is 2.I We obtained AUC = 0.6470

Page 35: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Local models for segments

I Check build_p2b_segmentation_local_models.RI Most important parts:

I Lines 48-52: building segmentation modelsI Lines 55-57: predict segmentation modelsI Lines 61-83: build local models for segments

I Lines 63: assign segments to customersI Lines 67-79: build models for segments for different number of

segments

I Lines 87-105: predict local models for test dataI Lines 107-132: select best models

Page 36: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Local models for segments - results

I Best number of segments is 2.I We obtained AUC = 0.6512

Page 37: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary

Page 38: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Summary of results

I No segmentation was worst with AUC = 0.6460I Segmentation as a predictor was second best with AUC = 0.6470I Local models were best with AUC = 0.6512

Page 39: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Are the differences significant?

I Check compare_models.RI Most important parts

I One can compare significance differences for ROC curvesI We used DeLong’s test

I ConclusionsI Adding segmentation as predictor is significant.I Local models give significant improvement to segment aspredictor.

Page 40: Showcase: on segmentation importance for marketing campaign in retail using R and H2O

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Thank you for you attention!