predictive analytics for everyone! building cart models using r - chantal d. larose, ph.d

16
Predictive Analytics for Everyone! Building CART Models using R Chantal Larose Assistant Professor of Decision Science (Statistics) School of Business, SUNY New Paltz DASH Lab Workshop March 8 2017

Upload: chantal-larose

Post on 07-Apr-2017

85 views

Category:

Education


2 download

TRANSCRIPT

Predictive Analytics for Everyone!Building CART Models using R

Chantal Larose

Assistant Professor of Decision Science (Statistics)School of Business, SUNY New Paltz

DASH Lab WorkshopMarch 8 2017

Why Predictive Analytics?

Sports, healthcare, customer service – the world is full of data!

Fun to pull stories out of a mess of numbers

Examples:

A Logistic Regression Approach to PredictingWho Will Make the NBA Playoffs 1

Data Mining Major League Baseballs Pace of Play Problem 2

More sports applications at:New England Symposium on Statistics in Sports

Saturday, September 23, 2017

1Ryan Elmore, Department of Business Information and Analytics, Daniels School of Business,University of Denver

2Aaron Crowley, Zhuolin He, and Rachael Hageman Blair. Department of Biostatistics, StateUniversity of New York at Buffalo

[email protected] Predictive Analytics for Everyone! Building CART Models using R 1

Why R?

Open source, free to download

Active and helpful community

Appeals to non-programmers: Different user interfaces (e.g. RStudio)allow for point-and-click interface for some tasks

Appeals to programmers: Customizable – program your own functions, etc.

[email protected] Predictive Analytics for Everyone! Building CART Models using R 2

Set-up for the Workshop

Open up RStudio on your laptop (Apps → Other → RStudio)

Go to the Workshop’s website:hawksites.newpaltz.edu/dashlab/predictive-analytics-for-everyone/

Download the Churn data set (.csv file)

Download the Adult data set (.csv file)

Download the Do It Yourself! guide (.R file)

The analyses in this workshop are covered in more detail in Data Mining andPredictive Analytics, Second Edition. Larose & Larose, Wiley, 2015.

[email protected] Predictive Analytics for Everyone! Building CART Models using R 3

Getting Acquainted with R

Open the Do It Yourself! R file.

[email protected] Predictive Analytics for Everyone! Building CART Models using R 4

Getting Acquainted with R

Input the data set:

[email protected] Predictive Analytics for Everyone! Building CART Models using R 5

Getting Acquainted with R

Let’s look at some code:

[email protected] Predictive Analytics for Everyone! Building CART Models using R 6

Getting Acquainted with R

How do we tell R to run the code?

1. Highlight the code and press the Run button

2. Put your cursor on the line you want to run and press Control+Enter(no need to highlight code)

[email protected] Predictive Analytics for Everyone! Building CART Models using R 7

Activity 1: CART Models

We want to predict the value of one variable, using other variables

For our first example:

We want to predict the value of Churn,

i.e. whether or not a customer leaves our company.

We will predict Churn using variables such as:

Day Mins: How many minutes during the day a customer uses their phone3

CustServ Calls: How many times a customer has called customer service

VMail Plan: Whether or not a customer has the voicemail plan

3Data is from when day and evening charges were different

[email protected] Predictive Analytics for Everyone! Building CART Models using R 8

Activity 1: CART Models

Wait – What about regression?

The data may be too messy to meet the normality requirements, evenwith transformations

Regression interpretations get very complex very fast (especially withtransformations)

Data set is too large!At some point, you have so many records that the F and t tests fromregression will come back significant, no matter what the reality of thesituation is

CART models generate easy-to-understand “decision rules” (IF this,THEN that) that make intuitive sense

[email protected] Predictive Analytics for Everyone! Building CART Models using R 9

Activity 1: CART Models - Setup

[email protected] Predictive Analytics for Everyone! Building CART Models using R 10

Activity 1: CART Models - Setup

[email protected] Predictive Analytics for Everyone! Building CART Models using R 11

Activity 1: CART Models

CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total

Day Mins < 264

CustServ Calls < 3.5

Int'l Plan = no

Day Mins < 223

Eve Mins < 260

VMail Plan = yes

Intl Calls >= 2.5

Intl Mins < 13

Day Mins >= 160

Eve Mins >= 142

Day Mins >= 176

Eve Mins >= 212

VMail Plan = yes

Eve Mins < 188

Day Mins < 278

Eve Mins < 144

>= 264

>= 3.5

yes

>= 223

>= 260

no

< 2.5

>= 13

< 160

< 142

< 176

< 212

no

>= 188

>= 278

>= 144

False.2850 / 3333

False.2766 / 3122

False.2642 / 2871

False.2476 / 2604

False.2161 / 2221

False.315 / 383

False.298 / 332

True.34 / 51

False.11 / 11

True.34 / 40

False.166 / 267

False.166 / 216

False.166 / 173

True.43 / 43

True.51 / 51

True.127 / 251

False.111 / 149

False.106 / 130

False.86 / 96

False.20 / 34

False.18 / 18

True.14 / 16

True.14 / 19

True.89 / 102

True.127 / 211

False.47 / 53

True.121 / 158

False.32 / 57

False.21 / 25

True.21 / 32

False.7 / 8

True.20 / 24

True.96 / 101

[email protected] Predictive Analytics for Everyone! Building CART Models using R 12

Activity 1: CART Models

CART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / TotalCART Model to Predict Churn. # Correct / Total

Day Mins < 264

CustServ Calls < 3.5

Day Mins >= 160

VMail Plan = yes

>= 264

>= 3.5

< 160

no

False.2850 / 3333

False.2766 / 3122

False.2642 / 2871

True.127 / 251

False.111 / 149

True.89 / 102

True.127 / 211

False.47 / 53

True.121 / 158

[email protected] Predictive Analytics for Everyone! Building CART Models using R 13

Activity 1: CART Models

[email protected] Predictive Analytics for Everyone! Building CART Models using R 14

Activity 2: On Your Own!

After you complete the Churn example,go to Line 75 to begin Example 2.

All the code you need is there.Follow the directions and run the code.

Task:After building the CART model,

use the model to find at least two decision rules.State the confidence level of each one.

[email protected] Predictive Analytics for Everyone! Building CART Models using R 15