data science for business - leonid zhukov · classification algorithms school of data analysis and...
TRANSCRIPT
![Page 1: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/1.jpg)
DATA SCIENCE FOR BUSINESS
Lecture 5. Customer relationship management. Churn prediction. Classification.
School of Data Analysis and Artificial Intelligence Department of Computer Science
Moscow, May 22nd, 2020.
![Page 2: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/2.jpg)
CUSTOMER RELATIONSHIP MANAGEMENTCRM an approach to managing a company's interaction with current and potential customers to improve business relationships with customers, focusing on customer retention and increasing sales
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Company functions• Marketing• Sales• Customer service support
![Page 3: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/3.jpg)
CUSTOMER RELATIONSHIP MANAGEMENTThe service sector consists of the production of services instead of end products
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
• Competitive landscape, easy to switch• High customer acquisition cost• Repetitive interaction - service• Large customer LTV
Service sector industries• Telecommunication• Healthcare• Education• Financial services (banking, insurance, investment)• Professional services (accounting, legal, consulting)• Transportation• Hospitality and tourism• Mass media• Real estate• Utilities
![Page 4: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/4.jpg)
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
• Propensity to buy new service or product• Propensity to respond to an offer• Propensity to churn
PROPENSITY MODELING Efficient customer communication and interaction
![Page 5: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/5.jpg)
CLASSIFICATION ALGORITHMS
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Examples of classification algorithms• Logistic regression• Decision trees• KNN k-nearest neighbors• Naïve Bayes• Ensemble methods:• Random forest • Gradient boosting decision trees (xgboost)• Neural networks & DL
Classification algorithms• Binary classification• Multi-class classification
Results:• Class assignments• Probability or score/ranking
![Page 6: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/6.jpg)
LOGISTIC REGRESSION
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Problem set-up:• Target variable y – binary 0/1 (class) • Predictors (x1,x,2,x3 ) – numerical
![Page 7: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/7.jpg)
DECISION TREE
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
• Decision node – feature/attribute test and rule (numerical or categorical)• Leaf node – outcome (class in classification)
Problem set-up:• Target variable binary class / multi-class • Predictors (x1,x,2,x3 ) – numerical or categorical
![Page 8: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/8.jpg)
RANDOM FOREST
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Problem set-up:• Target variable binary class / multi-class • Predictors (x1,x,2,x3 ) – numerical or categorical
• Bootsrap sampling of examples • Subsample of features/predictors per tree• Voting for the result
![Page 9: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/9.jpg)
MODEL TRAINING AND TESTINGSame rules apply as for regression algorithms
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
![Page 10: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/10.jpg)
CLASSIFICATION EVALUATION Quality metrics
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
True positivesTP
FalsePositives
FP
FalseNegatives
FN
True negativesTN
Actual
NegativePositive
Predicted
Yes (+)
No (-)
True positive = Predict event and event happens
True negative = Predict event does not happen, nothing happens
False positive = Predict event and event does not happen (false alarm)
False negative = Fail to predict event that does happen (missed alarm)
![Page 11: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/11.jpg)
CLASSIER EVALUATION Confusion matrix
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
True positivesTP
FalsePositives
FP
FalseNegatives
FN
True negativesTN
Actual
NegativePositive
Predicted
Yes (+)
No (-)
Positive50%
Negative50%
Negative10%
Positive90%
Data set 1 Data set 2
“Majority class” classifier ( classify all to positive)
![Page 12: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/12.jpg)
CLASSIFIER EVALUATION Normalized confusion matrix
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
True positivesratetpr
Falsepositives
ratefpr
Falsenegatives
ratefnr
True negativesratetnr
ActualNegativePositive
Predicted
Yes (+)
No (-)
True positivesTP
FalsePositives
FP
FalseNegatives
FN
True negativesTN
NegativePositive
Predicted
Yes (+)
No (-)
P N
Actual
tpr = TP / P = TP /(TP+FN)fpr = FP / N = FP /(FP+TN)
fnr = FN / P = 1 - tprtnr = TN / N = 1 - fpr
Accuracy = (TP +TN)/ (P+N) = tpr*P/(P+N) + (1-fpr)* N/(P+N)
![Page 13: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/13.jpg)
CLASSIFIER EVALUATION ROC space
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
instance is positive, alg - positive:tpr = TP/P = ‘specificity’ = ‘recall’
(% of positive detected)
instance is negative, alg - positive:fpr = FP/N = ‘false alarm rate’ = 1 – ‘sensitivity’
(% of negatives wrongly reacted to)
P NY 0.4 0.08N 0.6 0.92
P NY 0.9 0.6N 0.1 0.4
P NY 0.6 0.6N 0.4 0.4
no positives
all positives
![Page 14: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/14.jpg)
RANKING CLASSIFIERSThreshold on ranking
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
![Page 15: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/15.jpg)
RECEIVER OPERATING CURVE (ROC)
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
![Page 16: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/16.jpg)
RECEIVER OPERATING CURVE (ROC)Classifier quality metric – ROC AUC
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
ROC AUC = area under the ROC
curve
ROC AUC = 1 perfectROC AUC =0.5 random (bad)
![Page 17: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/17.jpg)
CUMULATIVE RESPONSE AND LIFT CURVESRanking classifiers
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
Lift of a classifier represents the advantage it provides over random guessing.
Cumulative response curves plot the percentage of positives correctly classified (tpr = TP/P) , as a function of the percentage
of the instances in the data
![Page 18: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/18.jpg)
CUSTOMER CHURN MODELING
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
![Page 19: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/19.jpg)
COST-SENSITIVE LEARNINGCost-benefit analysis
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
p(Y|p) = tprp(Y|n) = fprp(N|p) = fnrp(N|n) = tnr
p(p) = P/(P+N)p(n) = N/(P+N)
![Page 20: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/20.jpg)
PROFIT CURVESProfit from churn management campaign
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
P NY 0.92 0.14N 0.08 0.86
P N
Y $90 -$10
N $0 $0∑ $44.7
![Page 21: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/21.jpg)
ONE MORE BOOK
School of Data Analysis and Artificial Intelligence Department of Computer Science
.
![Page 22: DATA SCIENCE FOR BUSINESS - Leonid Zhukov · CLASSIFICATION ALGORITHMS School of Data Analysis and Artificial Intelligence Department of Computer Science. Examples of classification](https://reader033.vdocuments.us/reader033/viewer/2022060702/606f5814f72bc8184153f7da/html5/thumbnails/22.jpg)