maximizing a churn campaign’s profitability with cost sensitive predictive analytics

Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014

Maximizing a Churn Campaign’s Profitability With Cost-Sensitive

Predictive Analytics

Alejandro Correa Bahnsen, Luxembourg University Andres Felipe Gonzalez Montoya, DIRECTV

Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014

Agenda

• Churn modeling

• Evaluation Measures

• Offers

• Predictive modeling

• Cost-Sensitive Predictive Modeling

Cost Proportionate Sampling

Bayes Minimum Risk

CS – Decision Trees

• Conclusions


Churn Modeling

• Detect which customers are likely to abandon

Voluntary churn

Involuntary churn


Customer Churn Management Campaign

Inflow

New Customers

Customer Base

Active Customers

*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.

Predicted Churners

Predicted Non-Churners

TP: Actual Churners

FP: Actual Non-Churners

FN: Actual Churners

TN: Actual Non-Churners

Outflow

Effective Churners

Churn Model Prediction

1

1

1 − 𝛾 𝛾

1


Evaluation of a Campaign

• Confusion Matrix

• Accuracy =𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁

• Recall =𝑇𝑃

𝑇𝑃+𝐹𝑁

• Precision =𝑇𝑃

𝑇𝑃+𝐹𝑃

• F1-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

True Class (𝑦𝑖)

Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)

Predicted class (𝑐𝑖)

Churner (𝑐𝑖=1) TP FP

Non-Churner (𝑐𝑖=0) FN TN


Evaluation of a Campaign

• However these measures assign the same weight to different errors

• Not the case in a Churn model since Failing to predict a churner carries a different cost than wrongly

predicting a non-churner

Churners have different financial impact


Financial Evaluation of a Campaign

Inflow

New Customers

Customer Base

Active Customers

*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.

Predicted Churners

Predicted Non-Churners

TP: Actual Churners

FP: Actual Non-Churners

FN: Actual Churners

TN: Actual Non-Churners

Outflow

Effective Churners

Churn Model Prediction

0

𝐶𝐿𝑉

𝐶𝐿𝑉 + 𝐶𝑎 𝐶𝑜 + 𝐶𝑎

𝐶𝑜 + 𝐶𝑎



• Cost Matrix

where:

True Class (𝑦𝑖)

Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)

Predicted class (𝑐𝑖)

Churner (𝑐𝑖=1)

Non-Churner (𝑐𝑖=0)

𝐶𝑎 = Administrative cost 𝐶𝐿𝑉𝑖 = Client Lifetime Value of customer 𝑖

𝐶𝑜𝑖 = Cost of the offer made to

customer 𝑖

𝛾𝑖 = Probability that customer 𝑖 accepts the offer

𝐶𝑇𝑃𝑖= 𝛾𝑖𝐶𝑜𝑖 + 1 − 𝛾𝑖 𝐶𝐿𝑉𝑖 + 𝐶𝑎

𝐶𝐹𝑁𝑖= 𝐶𝐿𝑉𝑖 𝐶𝑇𝑁𝑖

= 0

𝐶𝐹𝑃𝑖= 𝐶𝑜𝑖 + 𝐶𝑎


Financial Evaluation of a Campaign • Using the cost matrix the total cost is calculated as:

𝐶 = 𝑦𝑖 𝑐𝑖 ∙ 𝐶𝑇𝑃𝑖 + 1 − 𝑐𝑖 𝐶𝐹𝑁𝑖 + 1 − 𝑦𝑖 𝑐𝑖 ∙ 𝐶𝐹𝑃𝑖 + 1 − 𝑐𝑖 𝐶𝑇𝑁𝑖

• Additionally the savings are defined as:

𝐶𝑠 =𝐶0 − 𝐶

𝐶0

where 𝐶0 is the cost when all the customers are predicted as non-churners


• Customer Lifetime Value


*Glady et al. (2009). Modeling churn using customer lifetime value.


Agenda

• Churn modeling


• Offers




Bayes Minimum Risk


• Conclusions


Offers

• Same offer may not apply to all customers (eg. Already have premium channels)

• An offer should be made such that it maximizes the probability of acceptance (𝛾) and CLV


Offers clusters


Offers Analysis

Improve to HD DVR

Monthly Discount

Premium Channels

Evaluate Offers

Performance


Offers Analysis

88%

90%

92%

94%

96%

98%

100%

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Churn Rate Gamma (right axis)

𝛾 = Probability that a customer accepts the offer


Predictive Modeling

• Using predictive analytics for detecting the behavioral patterns of those customer's who had defect in the past


Predictive Modeling

• Then check which of the current customers share the same patterns


Predictive Modeling

• Dataset

Dataset N Churn 𝑪𝟎 (Euros)

Total 9410 4.83% 580,884

Training 3758 5.05% 244,542

Validation 2824 4.77% 174,171

Testing 2825 4.42% 162,171

Under-Sampling 374 50.80% 244,542


Predictive Modeling

• Algorithms

Decision Trees

Logistic Regression

Random Forest


Predictive Modeling - Results

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

DecisionTrees

LogisticRegression

RandomForest

F1-Score

Training Under-Sampling

0%

1%

2%

3%

4%

5%

6%

7%

8%

Decision Trees LogisticRegression

RandomForest

Savings

Training Under-Sampling


Predictive Modeling - SMOTE

• Synthetic Minority Over-sampling Technique D

im 2

Dim 1 Synthetic samples



• Dataset


Total 9410 4.83% 580,884

Training 3758 5.05% 244,542

Validation 2824 4.77% 174,171

Testing 2825 4.42% 162,171

Under-Sampling 374 50.80% 244,542

SMOTE 6988 48.94% 4,273,083



0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

DecisionTrees

LogisticRegression

RandomForest

F1-Score

Training Under-Sampling SMOTE

0%

1%

2%

3%

4%

5%

6%

7%

8%


RandomForest

Savings

Training Under-Sampling SMOTE



• Sampling techniques helps to improve models’ predictive power however not necessarily the savings

• There is a need for methods that aim to increase savings


Agenda

• Churn modeling


• Offers




Bayes Minimum Risk


• Conclusions


Cost-Sensitive Predictive Modeling

• Traditional methods assume the same cost for different errors

• Not the case in Churn modeling

• Some cost-sensitive methods assume a constant cost difference between errors

• Example-Dependent Cost-Sensitive Predictive Modeling


Cost-Sensitive Predictive Modeling

• Changing class distribution Cost Proportionate Rejection Sampling

Cost Proportionate Over Sampling

• Direct Cost Bayes Minimum Risk

• Modifying a learning algorithm CS – Decision Tree



• Normalized Cost weight

𝑤𝑖 = 𝐶𝐹𝑃𝑖 𝑖𝑓 𝑦𝑖 = 0

𝐶𝐹𝑁𝑖 𝑖𝑓 𝑦𝑖 = 1

𝑤 𝑖 =𝑤𝑖

max𝑗

𝑤𝑗



• Cost Proportionate Over Sampling

Example 𝑦𝑖 𝑤𝑖

1 0 1

2 1 10

3 0 2

4 1 20

5 0 1

Initial Dataset

(1,0,1) (2,1,10) (3,0,2)

(4,1,20) (5,0,1)

Cost Proportionate Dataset

(1,0,1) (2,1,1), (2,1,1), …, (2,1,1)

(3,0,2), (3,0,2) (4,1,1), (4,1,1), (4,1,1), …, (4,1,1), (4,1,1)

(5,0,1)

*Elkan, C. (2001). The Foundations of Cost-Sensitive Learning.



• Cost Proportionate Rejection Sampling

Example 𝑦𝑖 𝑤𝑖

1 0 1

2 1 10

3 0 2

4 1 20

5 0 1

Initial Dataset

(1,0,1) (2,1,10) (3,0,2)

(4,1,20) (5,0,1)

Cost Proportionate

Dataset

(2,1,1) (4,1,1) (4,1,1) (5,0,1)

*Zadrozny et al. (2003). Cost-sensitive learning by cost-proportionate example weighting.

𝑤 𝑖

0.05

0.5

0.1

1

0.05



• Dataset


Total 9410 4.83% 580,884

Training 3758 5.05% 244,542

Validation 2824 4.77% 174,171

Testing 2825 4.42% 162,171

Under-Sampling 374 50.80% 244,542

SMOTE 6988 48.94% 4,273,083

CS – Rejection-Sampling 428 41.35% 231,428

CS – Over-Sampling 5767 31.24% 2,350,285



0%

5%

10%

15%

20%

25%


RandomForest

Savings

Training Under SMOTE

CS-Rejection CS-Over

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

DecisionTrees

LogisticRegression

RandomForest

F1-Score

Training Under SMOTE

CS-Rejection CS-Over


• Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions

• Risk of classification 𝑅 𝑐𝑖 = 0|𝑥𝑖 = 𝐶𝑇𝑁𝑖 1 − 𝑝 𝑖 + 𝐶𝐹𝑁𝑖 ∙ 𝑝 𝑖

𝑅 𝑐𝑖 = 1|𝑥𝑖 = 𝐶𝐹𝑃𝑖 1 − 𝑝 𝑖 + 𝐶𝑇𝑃𝑖 ∙ 𝑝 𝑖

Bayes Minimum Risk


• Using the different risks the prediction is made based on the following condition:

𝑐𝑖 = 0 𝑅 𝑐𝑖 = 0|𝑥𝑖 ≤ 𝑅 𝑐𝑖 = 1|𝑥𝑖 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

• Example-dependent threshold

𝑡𝐵𝑀𝑅𝑖 =𝐶𝐹𝑃𝑖 − 𝐶𝑇𝑁𝑖

𝐶𝐹𝑁𝑖 − 𝐶𝑇𝑁𝑖 − 𝐶𝑇𝑃𝑖 + 𝐶𝐹𝑃𝑖

Bayes Minimum Risk


Bayes Minimum Risk

0%

5%

10%

15%

20%

25%

30%

35%

- BMR - BMR - BMR

Decision Trees Logistic Regression Random Forest

Savings

Training Under-Sampling SMOTE CS-Rejection CS-Over


Bayes Minimum Risk

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

- BMR - BMR - BMR

Decision Trees Logistic Regression Random Forest

F1-Score



Bayes Minimum Risk

• Bayes Minimum Risk increases the savings by using a cost-insensitive method and then introducing the costs

• Why not introduce the costs during the estimation of the methods?



• Decision trees

Classification model that iteratively creates binary decision rules

𝑥𝑗 , 𝑙𝑗𝑚 that maximize certain criteria

Where 𝑥𝑗 , 𝑙𝑗𝑚 refers to making a rule using feature 𝑗 on value 𝑚


• Decision trees – Construction

• Then the impurity of each leaf is calculated using:

Misclassification : 𝐼𝑚 𝜋1 = 1 −𝑚𝑎𝑥 𝜋1, (1 − 𝜋1)

Entropy : 𝐼𝑒 𝜋1 = −𝜋1 log 𝜋1 − 1 − 𝜋1 log (1 − 𝜋1)

Gini : 𝐼𝑔 𝜋1 = 2𝜋1 1 − 𝜋1

𝜋1is the percentage of positives.


𝑆

𝑆𝑙 𝑆𝑟

𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗

𝑖> 𝑙𝑗𝑚

𝑥𝑗 , 𝑙𝑗𝑚


• Decision trees – Construction

• Afterwards the gain of applying a given rule to the set 𝑆 is:

𝐺𝑎𝑖𝑛 𝑥𝑗 , 𝑙𝑗𝑚 = 𝐼 𝜋1 −𝑆𝑙

𝑆𝐼(𝜋𝑙

1) −𝑆𝑟

𝑆𝐼(𝜋𝑟

1)


𝑆

𝑆𝑙 𝑆𝑟


𝑖> 𝑙𝑗𝑚



• Decision trees – Construction • The rule that maximizes the gain is selected

𝑏𝑒𝑠𝑡𝑥, 𝑏𝑒𝑠𝑡𝑙 = argmax(𝑗,𝑚)

𝐺𝑎𝑖𝑛 𝑥𝑗 , 𝑙𝑗𝑚

• The process is repeated until a stopping criteria is met:


S

S S

S S S S

S S S S


CS – Decision Trees • Decision trees - Pruning • Calculation of the Tree error and pruned Tree error

• After calculating the pruning criteria for all possible trees. The maximum improvement is selected and the Tree is pruned.

• Later the process is repeated until there is no further improvement.

S

S S

S S S S

S S S S

S

S S

S S S S

S S

S

S S

S S

𝜖 𝑇𝑟𝑒𝑒 𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒

𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|

𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒




• Maximize the accuracy is different than maximizing the cost

• To solve this, some studies had been proposed method that aim to introduce the cost-sensitivity into the algorithms

• However, research have been focused on class-dependent methods Instead we used a: Example-dependent cost based impurity measure

Example-dependent cost based pruning criteria


CS – Decision Trees • Cost based impurity measure

• The impurity of each leaf is calculated using:

𝐼𝑐 𝑆 = 𝑚𝑖𝑛 𝐶0, 𝐶1

𝑓(𝑆) = 0 𝐶0 ≤ 𝐶1 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

𝑆

𝑆𝑙 𝑆𝑟


𝑖> 𝑙𝑗𝑚




• Cost sensitive pruning

𝑃𝐶𝑐 =𝐶 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝐶 𝑇𝑟𝑒𝑒


• New pruning criteria that evaluates the improvement in cost of eliminating a particular branch



0%

10%

20%

30%

40%

50%

Error Pruning Cost Pruning

Decision Trees Cost-Sensitive Decision Trees

Savings




0

0.05

0.1

0.15

0.2

0.25

0.3

F1-Score



Comparison of Models

0%

10%

20%

30%

40%

50%

Random ForestTrain

Logistic RegressionCSRejection

Logistic RegressionBMR Train

Decision TreeCostPruningCSRejection

CS-Decision TreeTrain

Savings F1-Score


Conclusions

• Selecting models based on traditional statistics does not gives the best results measured by savings

• Incorporating the costs into the modeling helps to achieve higher savings


Other Applications • Fraud Detection

Correa Bahnsen et al. (2013). Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk.

Correa Bahnsen, et al. (2014). Improving Credit Card Fraud Detection with Calibrated Probabilities.

• Credit Scoring Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Credit

Scoring using Bayes Minimum Risk.

• Direct Marketing Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Decision

Trees.


Contact Information

Alejandro Correa Bahnsen

University of Luxembourg

Luxembourg

[email protected]

http://www.linkedin.com/in/albahnsen

http://www.slideshare.net/albahnsen

Andres Gonzalez Montoya

DIRECTV

Colombia

[email protected]

mailto:[email protected]





mailto:[email protected]

maximizing a churn campaign’s profitability with cost sensitive predictive analytics

Lifestyle