other modeling techniques

35
© Deloitte Consulting, 2004 Other Modeling Techniques James Guszcza, FCAS, MAAA CAS Predictive Modeling Seminar Chicago October, 2004

Upload: watson

Post on 29-Jan-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Other Modeling Techniques. James Guszcza, FCAS, MAAA CAS Predictive Modeling Seminar Chicago October, 2004. Agenda. CART overview Case study Spam Detection. CART. Classification And Regression Trees. CART. Developed by Breiman, Friedman, Olshen, Stone in early 80’s. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Other Modeling Techniques

© Deloitte Consulting, 2004

Other Modeling Techniques

James Guszcza, FCAS, MAAA

CAS Predictive Modeling Seminar

Chicago

October, 2004

Page 2: Other Modeling Techniques

© Deloitte Consulting, 2004

Agenda

CART overviewCase study

Spam Detection

Page 3: Other Modeling Techniques

© Deloitte Consulting, 2004

CART

Classification

And

Regression

Trees

Page 4: Other Modeling Techniques

© Deloitte Consulting, 2004

CART

Developed by Breiman, Friedman, Olshen, Stone in early 80’s.

Jerome Friedman wrote the original CART software (Fortran) to accompany the original CART monograph (1984).

One of many tree-based modeling techniques. CART CHAID C5.0 Software package variants

Page 5: Other Modeling Techniques

© Deloitte Consulting, 2004

Preface

“Tree Methodology… is a child of the computer age. Unlike many other statistical procedures which were moved from pencil and paper to calculators and then to computers, this use of trees was unthinkable before computers” --Breiman, Friedman, Olshen, Stone

Page 6: Other Modeling Techniques

© Deloitte Consulting, 2004

The Basic Idea

Recursive Partitioning Take all of your data. Consider all possible values of all variables. Select the variable/value (X=t1) that produces the

greatest “separation” in the target. (X=t1) is called a “split”.

If X< t1 then send the data to the “left”; otherwise, send data point to the “right”.

Now repeat same process on these two “nodes”CART only uses binary splits.

Page 7: Other Modeling Techniques

© Deloitte Consulting, 2004

Let’s Split

Suppose you have 3 variables:# vehicles: {1,2,3…10+}Age category: {1,2,3…6}Liability-only: {0,1}

At each iteration, CART tests all 15 splits.(#veh<2), (#veh<3),…, (#veh<10)(age<2),…, (age<6)(lia<1)

Select split resulting in greatest marginal purity.

Page 8: Other Modeling Techniques

© Deloitte Consulting, 2004

Classification Tree Example: predict likelihood of a claim

NUM_VEH <= 4.500

TerminalNode 1

Class Cases %0 29083 80.01 7276 20.0

N = 36359

NUM_VEH > 4.500

TerminalNode 2

Class Cases %0 8808 42.31 12036 57.7

N = 20844

Node 1NUM_VEHClass Cases %

0 37891 66.21 19312 33.8

N = 57203

Page 9: Other Modeling Techniques

© Deloitte Consulting, 2004

Classification Tree Example: predict likelihood of a claim

FREQ1_F_RPT <= 0.500

TerminalNode 1

Class = 0Class Cases %

0 18984 78.71 5138 21.3

N = 24122

FREQ1_F_RPT > 0.500

TerminalNode 2

Class = 1Class Cases %

0 2508 57.41 1859 42.6

N = 4367

LIAB_ONLY <= 0.500

Node 3FREQ1_F_RPT

N = 28489

LIAB_ONLY > 0.500

TerminalNode 3

Class = 0Class Cases %

0 7591 96.51 279 3.5

N = 7870

NUM_VEH <= 4.500

Node 2LIAB_ONLY

N = 36359

AVGAGE_CAT <= 8.500

TerminalNode 4

Class = 1Class Cases %

0 4327 48.11 4671 51.9

N = 8998

AVGAGE_CAT > 8.500

TerminalNode 5

Class = 0Class Cases %

0 2072 76.51 637 23.5

N = 2709

NUM_VEH <= 10.500

Node 5AVGAGE_CAT

N = 11707

NUM_VEH > 10.500

TerminalNode 6

Class = 1Class Cases %

0 2409 26.41 6728 73.6

N = 9137

NUM_VEH > 4.500

Node 4NUM_VEH

N = 20844

Node 1NUM_VEHN = 57203

Page 10: Other Modeling Techniques

© Deloitte Consulting, 2004

Categorical Splits

Categorical predictors: CART considers every possible subset of categories

Left (1st split): dump, farm, no truck

Right (1st split): contractor, hauling, food delivery, special delivery, waste, other

= ("dump",...)

TerminalNode 1

N = 11641

= ("hauling")

TerminalNode 2N = 652

= ("specDel")

TerminalNode 3N = 249

= ("hauling",...)

Node 3LINE_IND$

N = 901

= ("contr",...)

TerminalNode 4

N = 25758

= ("contr",...)

Node 2LINE_IND$

N = 26659

Node 1LINE_IND$N = 38300

Page 11: Other Modeling Techniques

© Deloitte Consulting, 2004

Gains Chart

Node 6: 16% of policies, 35% of claims.

Node 4: 16% of policies, 24% of claims.

Node 2: 8% of policies, 10% of claims.

..etc. The higher the gains

chart, the stronger the model.

Page 12: Other Modeling Techniques

© Deloitte Consulting, 2004

Splitting Rules

Select the variable value (X=t1) that produces the greatest “separation” in the target variable.

“Separation” defined in many ways.Regression Trees (continuous target): use

sum of squared errors.Classification Trees (categorical target):

choice of entropy, Gini measure, “twoing” splitting rule.

Page 13: Other Modeling Techniques

© Deloitte Consulting, 2004

Regression Trees

Tree-based modeling for continuous target variablemost intuitively appropriate method for loss ratio

analysis Find split that produces greatest separation in

∑y – E(y)2 i.e.: find nodes with minimal within variance

and therefore greatest between variance like credibility theory

Every record in a node is assigned the same yhat model is a step function

Page 14: Other Modeling Techniques

© Deloitte Consulting, 2004

Classification Trees

Tree-based modeling for discrete target variable In contrast with regression trees, various measures

of purity are used Common measures of purity:

Gini, entropy, “twoing”

Intuition: an ideal retention model would produce nodes that contain either defectors only or non-defectors only

completely pure nodes

Page 15: Other Modeling Techniques

© Deloitte Consulting, 2004

More on Splitting Criteria

Gini purity of a node p(1-p) where p = relative frequency of defectors

Entropy of a node -Σplogp -[p*log(p) + (1-p)*log(1-p)] Max entropy/Gini when p=.5 Min entropy/Gini when p=0 or 1

Gini might produce small but pure nodes The “twoing” rule strikes a balance between purity

and creating roughly equal-sized nodes

Page 16: Other Modeling Techniques

© Deloitte Consulting, 2004

Classification Trees vs. Regression Trees

Splitting Criteria: Gini, Entropy, Twoing

Goodness of fit measure: misclassification rates

Prior probabilities and misclassification costs available as model

“tuning parameters”

Splitting Criterion: sum of squared errors

Goodness of fit: same measure! sum of squared errors

No priors or misclassification costs… … just let it run

Page 17: Other Modeling Techniques

© Deloitte Consulting, 2004

CART advantages

Nonparametric (no probabilistic assumptions) Automatically performs variable selection Uses any combination of continuous/discrete

variables Discovers “interactions” among variables

Page 18: Other Modeling Techniques

© Deloitte Consulting, 2004

CART advantages

CART handles missing values automaticallyUsing “surrogate splits”

Invariant to monotonic transformations of predictive variable

Not sensitive to outliers in predictive variables Great way to explore, visualize data

Page 19: Other Modeling Techniques

© Deloitte Consulting, 2004

CART Disadvantages

The model is a step function, not a continuous scoreSo if a tree has 10 nodes, yhat can only take on 10

possible values.MARS improves this.

Might take a large tree to get good lift but then hard to interpret

Instability of model structure correlated variables random data fluctuations could

result in entirely different trees. CART does a poor job of modeling linear structure

Page 20: Other Modeling Techniques

© Deloitte Consulting, 2004

Case Study

Spam DetectionCARTMARS

Neural NetsGLM

Page 21: Other Modeling Techniques

© Deloitte Consulting, 2004

The Data

Goal: build a model to predict whether an incoming email is spam.

Analogous to insurance fraud detection.

About 6000 data points, each representing an email message sent to an HP scientist.

Binary target variable1 = the message was spam0 = the message was not spam

Predictive variables created based on frequencies of various words & characters.

Page 22: Other Modeling Techniques

© Deloitte Consulting, 2004

The Predictive Variables

57 variables createdFrequency of “George” (the scientist’s first name)Frequency of “!”, “$”, etc.Frequency of long strings of capital lettersFrequency of “receive”, “free”, “credit”….Etc

Variables creation required insight that (as yet) can’t be automated.

Analogous to the insurance variables an insightful actuary or underwriter can create.

Page 23: Other Modeling Techniques

© Deloitte Consulting, 2004

Methodology

Divide data 60%-40% into train-test. Use multiple techniques to fit models on train

data. Apply the models to the test data. Compare their power using gains charts.

Page 24: Other Modeling Techniques

© Deloitte Consulting, 2004

Un-pruned Tree

Just let CART keep splitting until the marginal improvement in purity diminishes

Too big! Use Cross-Validation

(on the train data) to prune back. Select the optimal sub-

tree.

|

Page 25: Other Modeling Techniques

© Deloitte Consulting, 2004

Pruned Tree

|`freq_!̀ < 0.0785

freq_remove< 0.045

freq_money< 0.01

`freq_$`< 0.0565

freq_george>=0.08

capLen_avg< 2.755

freq_remove< 0.025

freq_free< 0.24

freq_your< 0.615

`freq_$`< 0.015

freq_our< 0.58

freq_hp>=0.41

01458/126

019/8

16/46

012/0

111/101

0204/23

056/12

16/14

14/27

127/89

14/107

013/5

132/655

Page 26: Other Modeling Techniques

© Deloitte Consulting, 2004

CART Gains Chart

Use test data. 40% were spam.

The outer black line is the best one could do

The 45o line is the monkey throwing darts

The pruned tree is simple but does a good job.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Perc.Total.Pop

Pe

rc.S

pa

m

perfect modelCART

Spam Email Detection - Gains Charts

Page 27: Other Modeling Techniques

© Deloitte Consulting, 2004

Other Models

Fit a purely additive MARS model to the data. No interactions among basis functions

Fit a neural network with 3 hidden nodes. Fit a logistic regression (GLM). Fit an ordinary multiple regression.

This is a sin: the target is binary, not normal!

Page 28: Other Modeling Techniques

© Deloitte Consulting, 2004

Neural Net Weights

Page 29: Other Modeling Techniques

© Deloitte Consulting, 2004

Neural Net Intuition

You can think of a NNET as a set of logistic regressions embedded in another logistic regression.

221101

1zbzbbe

Y

1

X1

X3

X2

Z1

Z2

Y

1a11

a12

a21

a31

a321

b1

b2

a22

a01

a02

b0

331221111011

11 xbxbxbaeZ

332222112021

12 xbxbxbaeZ

Page 30: Other Modeling Techniques

© Deloitte Consulting, 2004

Neural Net Intuition

You can think of a NNET as a set of logistic regressions embedded in another logistic regression.

221101

1zbzbbe

Y

1

X1

X3

X2

Z1

Z2

Y

1a11

a12

a21

a31

a321

b1

b2

a22

a01

a02

b0

331221111011

11 xbxbxbaeZ

332222112021

12 xbxbxbaeZ

Page 31: Other Modeling Techniques

© Deloitte Consulting, 2004

MARS Basis Functions

0 2 4 6 8 10

0.30

0.45

Predictor 5

freq_our

0 2 4 6

0.3

0.6

0.9

Predictor 7

Res

pons

e 1

freq_remove

0 5 10 15 20

0.30

0.45

Predictor 16

Res

pons

e 1

freq_free

0 2 4 6 8 10 12 14

-0.1

0.1

0.3

Predictor 19

Res

pons

e 1

freq_you

0 5 10 15

0.2

0.6

1.0

Predictor 20

freq_credit

0 2 4 6 8 10

-0.1

0.2

Predictor 21

Res

pons

e 1

freq_your

0 5 10 15

0.30

0.50

Predictor 22

Res

pons

e 1

freq_font

0 2 4 6 8

0.30

0.45

Predictor 24

Res

pons

e 1

freq_money

0 5 10 15 20

0.00

0.15

Predictor 25

freq_hp

0 5 10 15 20 25 30

0.05

0.20

Predictor 27

Res

pons

e 1

freq_george

0 2 4 6 8

0.2

0.6

Predictor 28

Res

pons

e 1

freq_650

0 2 4 6 8 10

0.10

0.25

Predictor 42

Res

pons

e 1

freq_meeting

0 5 10 15

0.05

0.20

Predictor 44

freq_project

0 5 10 15 20

-0.0

50.

15

Predictor 46

Res

pons

e 1

freq_edu

0 1 2 3 4

-0.2

0.1

Predictor 49

Res

pons

e 1

freq_;

0 2 4 6 8

0.3

0.5

Predictor 52

Res

pons

e 1

freq_!

0 1 2 3 4 5

0.30

0.45

freq_$

0 100 200 300 400 500 600

0.2

0.5

Res

pons

e 1

capLen_avg

Page 32: Other Modeling Techniques

© Deloitte Consulting, 2004

Mars Intuition

This MARS model is just a regression model of the basis functions that MARS automatically found! Less black-boxy than

NNET. No interactions in this

particular model Finding the basis

functions is like CART taken a step further.

0 2 4 6 8 10

0.30

0.45

Predictor 5

freq_our

0 2 4 6

0.30.6

0.9

Predictor 7

Resp

onse

1

freq_remove

0 5 10 15 20

0.30

0.45

Predictor 16

Resp

onse

1

freq_free

0 2 4 6 8 10 12 14

-0.1

0.10.3

Predictor 19

Resp

onse

1

freq_you

0 5 10 15

0.20.6

1.0

Predictor 20

freq_credit

0 2 4 6 8 10

-0.1

0.2

Predictor 21

Resp

onse

1

freq_your

0 5 10 15

0.30

0.50

Predictor 22

Resp

onse

1

freq_font

0 2 4 6 8

0.30

0.45

Predictor 24

Resp

onse

1

freq_money

0 5 10 15 200.0

00.1

5Predictor 25

freq_hp

0 5 10 15 20 25 30

0.05

0.20

Predictor 27

Resp

onse

1

freq_george

0 2 4 6 8

0.20.6

Predictor 28

Resp

onse

1

freq_650

0 2 4 6 8 10

0.10

0.25

Predictor 42

Resp

onse

1

freq_meeting

0 5 10 15

0.05

0.20

Predictor 44

freq_project

0 5 10 15 20

-0.05

0.15

Predictor 46Re

spon

se 1

freq_edu

0 1 2 3 4

-0.2

0.1

Predictor 49

Resp

onse

1

freq_;

0 2 4 6 8

0.30.5

Predictor 52

Resp

onse

1

freq_!

0 1 2 3 4 5

0.30

0.45

freq_$

0 100 200 300 400 500 600

0.20.5

Resp

onse

1capLen_avg

Page 33: Other Modeling Techniques

© Deloitte Consulting, 2004

Comparison of Techniques

All techniques work pretty well!

Good variable creation at least as important as modeling technique.

MARS/NNET a bit stronger.

GLM a strong contender.

CART weaker. Even regression isn’t

too bad!0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Perc.Total.Pop

Pe

rc.S

pa

m

perfect modelmarsneural netdecision treeglmregression

Spam Email Detection - Gains Charts

Page 34: Other Modeling Techniques

© Deloitte Consulting, 2004

Concluding Thoughts

Often the true power of a predictive model comes from insightful variable creation.

Subject-matter expertise is critical.We don’t have true AI yet!

CART is highly intuitive and a great way to select variables and get a feel for your data.

GLM remains a great bet. Do CBA to decide whether MARS or NNET

are worth the complexity and trouble.

Page 35: Other Modeling Techniques

© Deloitte Consulting, 2004

Concluding Thoughts

Generating a bunch of answers is easy. Asking the right questions is the hard part!

Strategic goal?How to manage the project?Model design?Variable creation?How to do IT implementation?How to manage organizational buy-in?How do we measure the success of the project?

(Not just the model)