data mining to improve e-mail marketing
TRANSCRIPT
![Page 1: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/1.jpg)
TaykoSmart Marketing using analytics
![Page 2: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/2.jpg)
Business Problem
Tayko is a software catalog firm that sells games and educational software
Want to market a new collection using e-mail marketing. As member of an industry consortium, they can pull 2,00,000 emails
address from the central repository of the consortium. To maximize the benefit, Tayko wants to pull records with high
probability of response and higher value of sale.
![Page 3: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/3.jpg)
Analytics Problem
1. Create a classification model to groups the customer as responder or purchasers(1) and non-responders or non-purchasers(0).
2. Create a prediction model to predict the value of sale of the responder(1).
![Page 4: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/4.jpg)
Data Collection
Supervised learning techniques is to be applied as a desired output is required is already defined.
A sample of 2000 customer is drawn form the central repository and test e-mail marketing is done.
The 2 target variables : Purchased and Spending is recorded for the sample.
The result showed 1000 purchasers and 1000 non-purchasers
![Page 5: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/5.jpg)
Data partitioning
The data set is partitioned into Training set – 60% - 1200 records Testing – 20% - 400 records Validation – 20% - 400 records
![Page 6: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/6.jpg)
Initial StudyWhat kind of variables are present.
![Page 7: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/7.jpg)
Finding the variables with strong differentiation power – Nominal Variables
Use of Catalog A, T, U, P show high percentage of people making a purchase
Use of Catalog O, H show high percentage of people not making a purchase
But only Catalog A & U has been used for more than 100 customers. Catalog H for more than 50 customers & rest below 50 customers. Distribution of catalogs were not even.
![Page 8: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/8.jpg)
Other Nominal Variables
Out of other categorical variables : “Order Online” is the only one which show some power to differentiate between customer who purchased and the non-purchasers.
![Page 9: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/9.jpg)
Ordinal Variables Number of purchase last year shows a good trend People who have not made any purchase last year
have not made any purchase with the new catalogs also.
People who had made more than 3 purchase has surly made a purchase this time also
![Page 10: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/10.jpg)
Scale Variables
Out of the 2 scale variables “Last update to customer record” shows a significant difference in their mean.
![Page 11: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/11.jpg)
Target Variables
Purchaser and non-purchasers are equally distributed However the sales value or the amount spend by customer follows a
non-normal distribution
![Page 12: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/12.jpg)
ClassificationWho will make a purchase?
![Page 13: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/13.jpg)
Logistic Regression – Training
Final set of variables1. Frequency : Number of transactions in last year at
source catalog 2. Web Order : Customer placed at least 1 order via
web 3. Address is Residence : Address is a residence 4. Source_a, h or u :Source Catalog is A, U or H
![Page 14: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/14.jpg)
![Page 15: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/15.jpg)
Logistic Regression – Testing & Validation
Test Over-all accuracy : 80%
Validation Over-all accuracy : 77%
![Page 16: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/16.jpg)
Decision Tree – Training CHAID Growing method gave best results
![Page 17: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/17.jpg)
![Page 18: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/18.jpg)
Decision Tree – Test & Validate Test
Over-all accuracy : 76%
Validation Over-all accuracy : 74%
![Page 19: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/19.jpg)
Result
Logistic regression gives a better result than decision tree
![Page 20: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/20.jpg)
PredictionHow much a purchaser will spend?
![Page 21: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/21.jpg)
New Calculated Variables
• High correlation between “last_update_days_ago ” and “1st_update_days_ago ”• New calculated variable DayDiff which is difference of
the 2 variables
![Page 22: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/22.jpg)
Multiple Linear Regression
Pre-processiong Univariate analysis and transformation of Target Variable “Spend”
Outlier removal, Filtering and Transformation
![Page 23: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/23.jpg)
Model & Performance
4 models are generated Case 1 : None Residence Address & Not a Web-Order (R-sqr : 0.569 & Adj R-sqr :
0.566)Spending = -15.733 + 79.11 * No of transaction last year – 47.825 * Catalog D + 30.632 * Catalog U Case 2 : None Residence Address & Web-Order (R-sqr : 0.62 & Adj R-sqr : 0.616)Spending = -42.285 + 115.976 * No of transaction last year + 45.506 * Catalog U -247.655 * Catalog H + 55.605 Catalog R Case 3 : Residence Address & Not a Web-Order (R-sqr : 0.516 & Adj R-sqr : 0.507)Spending = -26.965 + 69.218 * No of transaction last year + 66.219 * Catalog U – 113.587*Catalog H Case 4 : Residence Address & Web-Order (R-sqr : 0.612 & Adj R-sqr : 0.592)Spending = -4.616 + 65.114 * No of transaction last year - 111.934*Catalog H – 81.28 * Catalog R – 129.754 * Catalog C + 66.242 * Catalog A
![Page 24: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/24.jpg)
MAD & MAPE
Training MAD : 68.89 MAPE : 103%
Test MAD : 104.53 MAPE : 109%
Validation MAD : 104.03 MAPE : 101%
![Page 25: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/25.jpg)
Regression Tree Exhaustive CHAID
![Page 26: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/26.jpg)
MAD & MAPE
Training MAD : 105.37 MAPE : 95%
Test MAD : 121.54 MAPE : 103%
Validation MAD : 121.31 MAPE : 113%
![Page 27: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/27.jpg)
Decision
Both the models are very weak in predicting the amount spent There is high error for evaluation indicators. One major reason for this can be the lack of scale variables and high
correlation between whatever scale variables are given. Since most variables are of nominal type, converting the prediction
problem to classification might produce better result. But it was out of scope for the given problem.
![Page 28: Data mining to improve e-mail marketing](https://reader031.vdocuments.us/reader031/viewer/2022021918/58a0ca6b1a28ab6d018b5d5d/html5/thumbnails/28.jpg)
Conclusion
The classification of customer into purchasers and non-purchasers shows good result and the elected logistic regression model is expected to show high performance in live situation also.
However the prediction models show weak performance and a high degree of error is expected if used in the current state.