lecture 5: customer lifecycle management – classification ... · classification problems...

47
Business Data Analytics Lecture 5: Customer Lifecycle Management – Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of these slides is the University of Tartu. 1

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Business Data Analytics

Lecture 5: Customer Lifecycle Management –Classification Problems

MTAT.03.319

The slides are available under creative common license. The original owner of these slides is the University of Tartu.

1

Page 2: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Outline

• Classification and its applications to CLM

• Classification with logistic regression

• Classification with decision trees

• Black-box classification models

• Measuring the quality of classification models

• Pitfalls: class imbalance & overfitting

2

Page 3: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Unsupervised learning Supervised learning

Feature 1

Fea

ture

1

Fea

ture

2

Feature 1

3

Page 4: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Raw Dataset

Regressor

/

Class

10

55

23

12

Features

Sam

ples

Raw Dataset

Classifier

Features

Sam

ples

Quantity

Supervised LearningRegression vs. classification

Sample

Sample

Class probability (confidence)

0.78

38Preprocessing

Preprocessing Training

Training Prediction

Prediction

Note: A classifier can be trained to classify each sample into more than two classes.4

Page 5: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Classification in Customer Lifecycle Management

Lead propensity Cross-sell/up-sell propensity

Response modeling / buying

propensity5

Page 6: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Applying classification

• We are a charity. We have a database containing 100K donors who have not donated in the past 12 months. We know their basic demographics, address and how much they have donated in past (and when).

• Sending a Christmas post card asking for donation costs 60 cents/piece. When we mail out, the average donation (when there is one) comes at about 2 euros.

• Should we send a (postal) mail to all 100K donors?

• How can a classifier help me answer this question?

6

Page 7: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Classification in Customer Lifecycle Management

Lead propensity Fault/Complaint prediction

Churn propensity

Cross-sell/up-sell propensity

Response modeling / buying

propensity

Win-back propensity

7

Page 8: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

event-basedsubscription-based

What is churn?

8

Page 9: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Why care?

For most companies, the customer acquisition cost (cost of acquiring a new customer) is multiple times higher than the cost

of retaining an existing customer

Bringing a new customer to the level of profitability as an old customer takes time

Churners/switchers often can be avoided with proactive measures

9

Page 10: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

How to decrease churn?

Long-termTactical intervention

Short-termOperational intervention

10

Page 11: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

How to decrease short-term churn?

Recurrent payments (automatic subscription renewal)

Reminder emails (e.g. account expiration)

Providing extra services and knowledge

(e.g. figure out where clients have troubles and provide help with that)

Use scoring model to determine the intervention type and moment

Use scoring model to determine the intervention type and moment

11

Page 12: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

is not regression!*

Logistic regression

it is classification

* it is called so due to its similarity to linear regression12

Page 13: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

means that y is binary: (0,1)

Binary logistic regression

13

Page 14: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Binary logistic regression

• sigmoid means S-shaped

• also known as squashing function since it maps the line to [0,1],

• which is necessary if the output needs to be interpreted as probability

14

Page 15: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Binary logistic regression

“0”

“1”

If we threshold the output at 0.5, we create

a decision rule of the form

15

Page 16: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

logit <- glm(data=train, as.factor(danger)~waiting, family='binomial')test$predictions_probability <- predict(newdata=test, logit, type = 'response')test$predictions_binary <- ifelse(test$predictions_probability<=0.5, 0, 1)table(real=test$danger, predictions=test$predictions_binary)

predicted valuereal 0 1

0 34 01 2 64

Binary logistic regression: example in R

16

Page 17: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Binary logistic regression: example in R

logit <- glm(data=train, as.factor(danger)~waiting, family='binomial')summary(logit)

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -42.5845 11.5448 -3.689 0.000225 ***waiting 0.6380 0.1728 3.692 0.000223 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Coefficients are difficult to interpret directly: they are logarithm of odds

We can take exponent of coefficients: exp(coef(logit))

(Intercept) waiting "0.000" "1.893"

17

An increase of waiting by

one unit increases the odds

of “danger” by 1.893

Page 18: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Outline

• Classification and applications to CLM

• Classification with logistic regression

• Classification with decision trees

• Black-box classification models

• Measuring the quality of classification models

• Pitfalls: class imbalance & overfitting

18

Page 19: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Decision Trees

Minivan

Age

Car Type

YES NO

YES

<30 >=30

Sports, Truck

0 30 60 Age

YES

YES

NO

Minivan

Sports,Truck

Page 20: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Decision Tree: Split Selection

• Numerical or ordered attributes: Find a split point that separates the (two) classes

(Yes: No: )

30 35

Age

Page 21: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Decision Tree: Split Selection

Categorical attributes: How to group?

Sport: Truck: Minivan:

(Sport, Truck): (Minivan):

(Sport): (Truck, Minivan):

(Sport, Minivan): (Truck)

Which split should we pick?Hint: the one with the higher “purity” impurity measure

Page 22: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Decision Tree

Advantages

1. Easy to Understand

2. Useful in Data exploration

3. Very fast to calculate (even for large datasets)

Disadvantages

1. Low accuracy (like logistic regression)

2. Over fitting (more on this later)

22

Page 23: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Input

Output

Input

Output

White-Box vs. Black-Box Models

Logistic regression

Decision tree

Ensemble methods:

Random Forests

Gradient boosting (e.g. XGBoost)

23

Page 24: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Ensemble methods• Idea: Train multiple classifiers on subsets of the

data or subsets of features and have them compete

• Examples:

• Random forests

24

Page 25: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Random forest

25

3 key ideas:1. Tree Bagging: each tree built with a random subset of data2. Each tree is built with a random subset of the features3. Voting

Page 26: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Ensemble methods• Idea: Train multiple classifiers on subsets of the

data or subsets of features and have them compete

• Examples:

• Random forests

• XGBoost, LightGBM (state of the art)

• Why is it powerful?

• Because we can play around with the parameters, e.g. number of trees, depth, etc.

This is called “hyperparameter optimization”

26

Page 27: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Outline

• Classification and applications to CLM

• Classification with logistic regression

• Classification with decision trees

• Black-box classification models

• Measuring the quality of classification models

• Pitfalls: class imbalance & overfitting

27

Page 28: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Quality of Classifiers

28

Page 29: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Quality of classifiers

29

Page 30: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Calculate a confusion matrix

It depends on the class probability threshold!It depends on the class probability threshold!

Quality of Classifiers

30

Actual classProbability of positive classFeatures

Page 31: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Threshold= 0.5i.e., if prediction >= 0.5

then churn (1)else no churn (0)

Threshold= 0.5i.e., if prediction >= 0.5

then churn (1)else no churn (0)

Quality of Classifiers

Actual/Predicted 1 (pos) 0 (neg)

1 (pos) 2 1

0 (neg) 0 2

31

Actual classProbability of positive classFeatures

Page 32: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

confusion matrix

Quality of Classifiers

Actual/Predicted 1 (P) 0 (N)

1 (P) 2 1

0 (N) 0 2

Precision = 2/3

2 3

32

Recall = 2/2

32

Actual classProbability of positive classFeatures

Page 33: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Threshold= 0.8i.e., if prediction >= 0.8

then churn (1)else no churn (0)

Threshold= 0.8i.e., if prediction >= 0.8

then churn (1)else no churn (0)

Quality of Classifiers

Actual/Predicted 1 (pos) 0 (neg)

1 (pos) 2 0

0 (neg) 0 3

33

Actual classProbability of positive classFeatures

Now: what is the precision and the recall?

Page 34: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Quality of Classifiers

• Accuracy: Ratio of correctly predicted observation to the total observations

• Accuracy = (TP+TN)/(TP+FP+FN+TN)

• Intuitive, but terrible in case of class imbalance

• F Score: Weighted average of Precision and Recall.

• F-Score = 2*(Recall * Precision) / (Recall + Precision)

• More resilient to class imbalance

34

Page 35: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Outline

• Classification and applications to CLM

• Classification with logistic regression

• Classification with decision trees

• Black-box classification models

• Measuring the quality of classification models

• Pitfalls: class imbalance & overfitting

35

Page 36: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Class imbalance

A credit card company collects data about OK vs fraudulent transactions

OK transactions make up 99.999% of transactions

We train a classifier to classify transactions into OK and fraudulent

We get an accuracy of 99.999%

Why?

Page 37: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Class imbalance

always predict

not-fraud

99.997% accuracy

99.997% precision

0 recall!

Why does the classifier learn this?

Because twisting the classifier to catch the bad guys, makes us also “catch” some good guys

… so no visible improvement

Slide by: David Kauchak

Page 38: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Dealing with class imbalance

38

• Use a quality measure that is as robust as possible

• F-Score is OK in extreme cases (like the previous one)

• Less robust in more subtle cases, e.g. 90-10 class imbalance

• ROC and AUC (next) are better…

• Area Under the Precision Recall Curve even better (AUPRC)

Page 39: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

ROC Curve and AUC: Another metric robust against imbalanced dataset

ROC – Robust Measurement of Classifier Quality

True Positive Rate (TPR) = Sensitivity =

False Positive Rate (FPR) = fp/(fp + tn)

Let’s plot the ROC curve for the following example

TP

R

FPR

01

1

0.5

0.25

0.75

0.25 0.5 0.75

39

Page 40: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

random guess

both our model, and the second one predict correctly

when they are confident in their scores

model1: Area under the curve: 0.9991

model2: Area under the curve: 0.8365

model2 makes

mistakes earlier

Intuition behind ROC

40

Page 41: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

random guess

both our model, and the second one predict correctly

when they are confident in their scores

model1: Area under the curve: 0.9991

model2: Area under the curve: 0.8365

model2 makes

mistakes earlier

Intuition behind ROC

AUCAUC

41

Page 42: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Dealing within class imbalance

• Use a quality measure that is as robust as possible

• F-Score is OK in extreme cases (like the previous one)

• Less robust in more subtle cases, e.g. 90-10 class imbalance

• ROC and AUC (next) are better…

• Area Under the Precision Recall Curve even better (AUPRC)

• Undersampling

• Deliberately throw away training data

• E.g. we have 6000 positive samples, 500 negative for training

• We randomly select 500 of the 6000 positive samples

• Oversampling

• Synthetic generation of minority class data, e.g. SMOTE42

Page 43: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

http://www.inf.ed.ac.uk/teaching/courses/iaml/slides/eval-2x2.pdf

Under- and overfitting

43

Page 44: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

How to detect overfitting

• Compute AUC (or F-score) on the training set and on the testing set

• The larger the diff, the more likely overfitting has taken place

• E.g. AUC on training = 0.97, AUC on testing = 0.85

Page 45: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Detecting overfitting (more rigorously)

Source: http://karlrosaen.com/ml/learning-log/2016-06-20/

K-Folds cross validation

Step1: We split our data into K different splits.

Step2: We use (K-1) folds for training and Kth for testing.

Step3: We do the step 2 for every different K set.

Step4: Finally we do average of the model quality across all the models.

The measure you get is more reflective of the likely performance in practice…

E = Output (Error or Accuracy) of your model

45

Page 46: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

Practicetime!

https://courses.cs.ut.ee/2018/bda/fall/Main/Practice

46

Page 47: Lecture 5: Customer Lifecycle Management – Classification ... · Classification Problems MTAT.03.319 The slides are available under creative common license. The original owner of

In case you need more…

TheoryLogistic regression: https://www.youtube.com/watch?v=Z5WKQr4H4XkDecision Trees: https://www.youtube.com/watch?v=opQ49Xr748kRandom Forest: https://www.youtube.com/watch?v=gmmV4drPTS4

PracticalLogistic regression: https://www.youtube.com/watch?v=nubin7hq4-sDecision Tree: https://www.youtube.com/watch?v=JFJIQ0_2ijg&t=1026sRandom Forest: https://www.youtube.com/watch?v=nVMw7fTlj4o

47