maximizing a churn campaign’s profitability with cost sensitive predictive analytics
DESCRIPTION
Presentation at SAS Analytics conference 2014 Predictive analytics has been applied to solve a wide range of real-world problems. Nevertheless, current state-of-the-art predictive analytics models are not well aligned with business needs since they don't include the real financial costs and benefits during the training and evaluation phases. Churn modeling does not yield the best results when it's measured by investment per subscriber on a loyalty campaign and the financial impact of failing to detect a churner versus wrongly predicting a non-churner. This presentation will show how using a cost-sensitive modeling approach leads to better results in terms of profitability and predictive power – and is applicable to many other business challenges.TRANSCRIPT
Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014
Maximizing a Churn Campaign’s Profitability With Cost-Sensitive
Predictive Analytics
Alejandro Correa Bahnsen, Luxembourg University Andres Felipe Gonzalez Montoya, DIRECTV
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Churn Modeling
• Detect which customers are likely to abandon
Voluntary churn
Involuntary churn
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Customer Churn Management Campaign
Inflow
New Customers
Customer Base
Active Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective Churners
Churn Model Prediction
1
1
1 − 𝛾 𝛾
1
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Evaluation of a Campaign
• Confusion Matrix
• Accuracy =𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
• Recall =𝑇𝑃
𝑇𝑃+𝐹𝑁
• Precision =𝑇𝑃
𝑇𝑃+𝐹𝑃
• F1-Score = 2𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
True Class (𝑦𝑖)
Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)
Predicted class (𝑐𝑖)
Churner (𝑐𝑖=1) TP FP
Non-Churner (𝑐𝑖=0) FN TN
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Evaluation of a Campaign
• However these measures assign the same weight to different errors
• Not the case in a Churn model since Failing to predict a churner carries a different cost than wrongly
predicting a non-churner
Churners have different financial impact
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
Inflow
New Customers
Customer Base
Active Customers
*Verbraken et. al (2013). A novel profit maximizing metric for measuring classification performance of customer churn prediction models.
Predicted Churners
Predicted Non-Churners
TP: Actual Churners
FP: Actual Non-Churners
FN: Actual Churners
TN: Actual Non-Churners
Outflow
Effective Churners
Churn Model Prediction
0
𝐶𝐿𝑉
𝐶𝐿𝑉 + 𝐶𝑎 𝐶𝑜 + 𝐶𝑎
𝐶𝑜 + 𝐶𝑎
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign
• Cost Matrix
where:
True Class (𝑦𝑖)
Churner (𝑦𝑖=1) Non-Churner(𝑦𝑖=0)
Predicted class (𝑐𝑖)
Churner (𝑐𝑖=1)
Non-Churner (𝑐𝑖=0)
𝐶𝑎 = Administrative cost 𝐶𝐿𝑉𝑖 = Client Lifetime Value of customer 𝑖
𝐶𝑜𝑖 = Cost of the offer made to
customer 𝑖
𝛾𝑖 = Probability that customer 𝑖 accepts the offer
𝐶𝑇𝑃𝑖= 𝛾𝑖𝐶𝑜𝑖 + 1 − 𝛾𝑖 𝐶𝐿𝑉𝑖 + 𝐶𝑎
𝐶𝐹𝑁𝑖= 𝐶𝐿𝑉𝑖 𝐶𝑇𝑁𝑖
= 0
𝐶𝐹𝑃𝑖= 𝐶𝑜𝑖 + 𝐶𝑎
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Financial Evaluation of a Campaign • Using the cost matrix the total cost is calculated as:
𝐶 = 𝑦𝑖 𝑐𝑖 ∙ 𝐶𝑇𝑃𝑖 + 1 − 𝑐𝑖 𝐶𝐹𝑁𝑖 + 1 − 𝑦𝑖 𝑐𝑖 ∙ 𝐶𝐹𝑃𝑖 + 1 − 𝑐𝑖 𝐶𝑇𝑁𝑖
• Additionally the savings are defined as:
𝐶𝑠 =𝐶0 − 𝐶
𝐶0
where 𝐶0 is the cost when all the customers are predicted as non-churners
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Customer Lifetime Value
Financial Evaluation of a Campaign
*Glady et al. (2009). Modeling churn using customer lifetime value.
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers
• Same offer may not apply to all customers (eg. Already have premium channels)
• An offer should be made such that it maximizes the probability of acceptance (𝛾) and CLV
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers clusters
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers Analysis
Improve to HD DVR
Monthly Discount
Premium Channels
Evaluate Offers
Performance
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Offers Analysis
88%
90%
92%
94%
96%
98%
100%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Churn Rate Gamma (right axis)
𝛾 = Probability that a customer accepts the offer
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Using predictive analytics for detecting the behavioral patterns of those customer's who had defect in the past
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Then check which of the current customers share the same patterns
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Dataset
Dataset N Churn 𝑪𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling
• Algorithms
Decision Trees
Logistic Regression
Random Forest
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - Results
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DecisionTrees
LogisticRegression
RandomForest
F1-Score
Training Under-Sampling
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision Trees LogisticRegression
RandomForest
Savings
Training Under-Sampling
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Synthetic Minority Over-sampling Technique D
im 2
Dim 1 Synthetic samples
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Dataset
Dataset N Churn 𝑪𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
SMOTE 6988 48.94% 4,273,083
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DecisionTrees
LogisticRegression
RandomForest
F1-Score
Training Under-Sampling SMOTE
0%
1%
2%
3%
4%
5%
6%
7%
8%
Decision Trees LogisticRegression
RandomForest
Savings
Training Under-Sampling SMOTE
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Predictive Modeling - SMOTE
• Sampling techniques helps to improve models’ predictive power however not necessarily the savings
• There is a need for methods that aim to increase savings
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Agenda
• Churn modeling
• Evaluation Measures
• Offers
• Predictive modeling
• Cost-Sensitive Predictive Modeling
Cost Proportionate Sampling
Bayes Minimum Risk
CS – Decision Trees
• Conclusions
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost-Sensitive Predictive Modeling
• Traditional methods assume the same cost for different errors
• Not the case in Churn modeling
• Some cost-sensitive methods assume a constant cost difference between errors
• Example-Dependent Cost-Sensitive Predictive Modeling
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost-Sensitive Predictive Modeling
• Changing class distribution Cost Proportionate Rejection Sampling
Cost Proportionate Over Sampling
• Direct Cost Bayes Minimum Risk
• Modifying a learning algorithm CS – Decision Tree
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Normalized Cost weight
𝑤𝑖 = 𝐶𝐹𝑃𝑖 𝑖𝑓 𝑦𝑖 = 0
𝐶𝐹𝑁𝑖 𝑖𝑓 𝑦𝑖 = 1
𝑤 𝑖 =𝑤𝑖
max𝑗
𝑤𝑗
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Cost Proportionate Over Sampling
Example 𝑦𝑖 𝑤𝑖
1 0 1
2 1 10
3 0 2
4 1 20
5 0 1
Initial Dataset
(1,0,1) (2,1,10) (3,0,2)
(4,1,20) (5,0,1)
Cost Proportionate Dataset
(1,0,1) (2,1,1), (2,1,1), …, (2,1,1)
(3,0,2), (3,0,2) (4,1,1), (4,1,1), (4,1,1), …, (4,1,1), (4,1,1)
(5,0,1)
*Elkan, C. (2001). The Foundations of Cost-Sensitive Learning.
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Cost Proportionate Rejection Sampling
Example 𝑦𝑖 𝑤𝑖
1 0 1
2 1 10
3 0 2
4 1 20
5 0 1
Initial Dataset
(1,0,1) (2,1,10) (3,0,2)
(4,1,20) (5,0,1)
Cost Proportionate
Dataset
(2,1,1) (4,1,1) (4,1,1) (5,0,1)
*Zadrozny et al. (2003). Cost-sensitive learning by cost-proportionate example weighting.
𝑤 𝑖
0.05
0.5
0.1
1
0.05
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
• Dataset
Dataset N Churn 𝑪𝟎 (Euros)
Total 9410 4.83% 580,884
Training 3758 5.05% 244,542
Validation 2824 4.77% 174,171
Testing 2825 4.42% 162,171
Under-Sampling 374 50.80% 244,542
SMOTE 6988 48.94% 4,273,083
CS – Rejection-Sampling 428 41.35% 231,428
CS – Over-Sampling 5767 31.24% 2,350,285
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Cost Proportionate Sampling
0%
5%
10%
15%
20%
25%
Decision Trees LogisticRegression
RandomForest
Savings
Training Under SMOTE
CS-Rejection CS-Over
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
DecisionTrees
LogisticRegression
RandomForest
F1-Score
Training Under SMOTE
CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions
• Risk of classification 𝑅 𝑐𝑖 = 0|𝑥𝑖 = 𝐶𝑇𝑁𝑖 1 − 𝑝 𝑖 + 𝐶𝐹𝑁𝑖 ∙ 𝑝 𝑖
𝑅 𝑐𝑖 = 1|𝑥𝑖 = 𝐶𝐹𝑃𝑖 1 − 𝑝 𝑖 + 𝐶𝑇𝑃𝑖 ∙ 𝑝 𝑖
Bayes Minimum Risk
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Using the different risks the prediction is made based on the following condition:
𝑐𝑖 = 0 𝑅 𝑐𝑖 = 0|𝑥𝑖 ≤ 𝑅 𝑐𝑖 = 1|𝑥𝑖 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
• Example-dependent threshold
𝑡𝐵𝑀𝑅𝑖 =𝐶𝐹𝑃𝑖 − 𝐶𝑇𝑁𝑖
𝐶𝐹𝑁𝑖 − 𝐶𝑇𝑁𝑖 − 𝐶𝑇𝑃𝑖 + 𝐶𝐹𝑃𝑖
Bayes Minimum Risk
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
0%
5%
10%
15%
20%
25%
30%
35%
- BMR - BMR - BMR
Decision Trees Logistic Regression Random Forest
Savings
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
- BMR - BMR - BMR
Decision Trees Logistic Regression Random Forest
F1-Score
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Bayes Minimum Risk
• Bayes Minimum Risk increases the savings by using a cost-insensitive method and then introducing the costs
• Why not introduce the costs during the estimation of the methods?
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Decision trees
Classification model that iteratively creates binary decision rules
𝑥𝑗 , 𝑙𝑗𝑚 that maximize certain criteria
Where 𝑥𝑗 , 𝑙𝑗𝑚 refers to making a rule using feature 𝑗 on value 𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• Then the impurity of each leaf is calculated using:
Misclassification : 𝐼𝑚 𝜋1 = 1 −𝑚𝑎𝑥 𝜋1, (1 − 𝜋1)
Entropy : 𝐼𝑒 𝜋1 = −𝜋1 log 𝜋1 − 1 − 𝜋1 log (1 − 𝜋1)
Gini : 𝐼𝑔 𝜋1 = 2𝜋1 1 − 𝜋1
𝜋1is the percentage of positives.
CS – Decision Trees
𝑆
𝑆𝑙 𝑆𝑟
𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗
𝑖> 𝑙𝑗𝑚
𝑥𝑗 , 𝑙𝑗𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction
• Afterwards the gain of applying a given rule to the set 𝑆 is:
𝐺𝑎𝑖𝑛 𝑥𝑗 , 𝑙𝑗𝑚 = 𝐼 𝜋1 −𝑆𝑙
𝑆𝐼(𝜋𝑙
1) −𝑆𝑟
𝑆𝐼(𝜋𝑟
1)
CS – Decision Trees
𝑆
𝑆𝑙 𝑆𝑟
𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗
𝑖> 𝑙𝑗𝑚
𝑥𝑗 , 𝑙𝑗𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
• Decision trees – Construction • The rule that maximizes the gain is selected
𝑏𝑒𝑠𝑡𝑥, 𝑏𝑒𝑠𝑡𝑙 = argmax(𝑗,𝑚)
𝐺𝑎𝑖𝑛 𝑥𝑗 , 𝑙𝑗𝑚
• The process is repeated until a stopping criteria is met:
CS – Decision Trees
S
S S
S S S S
S S S S
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees • Decision trees - Pruning • Calculation of the Tree error and pruned Tree error
• After calculating the pruning criteria for all possible trees. The maximum improvement is selected and the Tree is pruned.
• Later the process is repeated until there is no further improvement.
S
S S
S S S S
S S S S
S
S S
S S S S
S S
S
S S
S S
𝜖 𝑇𝑟𝑒𝑒 𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
𝜖 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝜖 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Maximize the accuracy is different than maximizing the cost
• To solve this, some studies had been proposed method that aim to introduce the cost-sensitivity into the algorithms
• However, research have been focused on class-dependent methods Instead we used a: Example-dependent cost based impurity measure
Example-dependent cost based pruning criteria
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees • Cost based impurity measure
• The impurity of each leaf is calculated using:
𝐼𝑐 𝑆 = 𝑚𝑖𝑛 𝐶0, 𝐶1
𝑓(𝑆) = 0 𝐶0 ≤ 𝐶1 1 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑆
𝑆𝑙 𝑆𝑟
𝑆𝑙 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗𝑖≤ 𝑙𝑗𝑚 𝑆𝑟 = 𝑆|𝑋𝑖 ∈ 𝑆 ⋀ 𝑥𝑗
𝑖> 𝑙𝑗𝑚
𝑥𝑗 , 𝑙𝑗𝑚
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
• Cost sensitive pruning
𝑃𝐶𝑐 =𝐶 𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ) − 𝐶 𝑇𝑟𝑒𝑒
𝑇𝑟𝑒𝑒 − |𝐸𝐵(𝑇𝑟𝑒𝑒, 𝑏𝑟𝑎𝑐ℎ)|
• New pruning criteria that evaluates the improvement in cost of eliminating a particular branch
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
0%
10%
20%
30%
40%
50%
Error Pruning Cost Pruning
Decision Trees Cost-Sensitive Decision Trees
Savings
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
CS – Decision Trees
0
0.05
0.1
0.15
0.2
0.25
0.3
F1-Score
Training Under-Sampling SMOTE CS-Rejection CS-Over
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Comparison of Models
0%
10%
20%
30%
40%
50%
Random ForestTrain
Logistic RegressionCSRejection
Logistic RegressionBMR Train
Decision TreeCostPruningCSRejection
CS-Decision TreeTrain
Savings F1-Score
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Conclusions
• Selecting models based on traditional statistics does not gives the best results measured by savings
• Incorporating the costs into the modeling helps to achieve higher savings
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Other Applications • Fraud Detection
Correa Bahnsen et al. (2013). Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk.
Correa Bahnsen, et al. (2014). Improving Credit Card Fraud Detection with Calibrated Probabilities.
• Credit Scoring Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Credit
Scoring using Bayes Minimum Risk.
• Direct Marketing Correa Bahnsen, et al. (2014). Example-Dependent Cost-Sensitive Decision
Trees.
Copyright © 2014, SAS Institute Inc. All rights reserved. #analytics2014
Contact Information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen
Andres Gonzalez Montoya
DIRECTV
Colombia
Copyright © 2014 SAS Institute Inc. All rights reserved. #analytics2014
Thank you!
Alejandro Correa Bahnsen, Luxembourg University Andres Felipe Gonzalez Montoya, DIRECTV