predictive modeling using decision trees

1

Predictive Modeling Using Decision Trees

Introduction

Decision Trees Powerful/popular for classification & prediction Represent rules

– Rules can be expressed in English– IF Age <=43 & Sex = Male

& Credit Card Insurance = NoTHEN Life Insurance Promotion = No

Useful to explore data to gain insight into relationships of a large number of candidate input variables to a target (output) variable

Decision Tree – What is it?

A structure that can be used to divide up a large

collection of records into successively smaller

sets of records by applying a sequence of simple

decision rules

A decision tree model consists of a set of rules for

dividing a large heterogeneous population into

smaller, more homogeneous groups with respect

to a particular target variable

4

Banking marketing scenario (HMEQ):Target :

• default on a home-equity line of credit (BAD)Inputs :

• number of delinquent trade lines (DELINQ)• number of credit inquiries (NINQ)• debt to income ratio (DEBTINC)• possibly many other inputs

Decision Trees: HMEQ Example

5

Introduction to Decision Tree modeling

6

Decision Trees:

Interpretation of the fitted decision tree•The internal nodes contain rules•Start at the root node (top) and follow the rules until a terminal node (leaf) is reached. •The leaves contain the estimate of the expected value of the target – in this case the posterior probability of BAD. The probability can then be used to allocate cases to classes.

Decision Tree Template

Drawn top-to-bottom or left-to-

right

Top (or left-most) node = Root

Node

Descendent node(s) = Child

Node(s)

Bottom (or right-most) node(s)

= Leaf Node(s)

Unique path from root to each

leaf = Rule

Root

Child Child Leaf

LeafChild

Leaf

8

Divide and Conquer

n = 5,000

10% BAD

n = 3,350 n = 1,650Debt-to-Income

Ratio < 45

yes no

21% BAD5% BAD

The tree is fitted to the data by recursive partitioning. Partitioning refers to segmenting the data into subgroups that are as homogeneous as possible with respect to the target. In this case, the binary split (Debt-to-Income Ratio < 45) was chosen. The 5,000 cases were split into two groups, one with a 5% BAD rate and the other with a 21% BAD rate.

The method is recursive because each subgroup results from splitting a subgroup from a previous split. Thus, the 3,350 cases in the left child node and the 1,650 cases in the right child node are split again in similar fashion.

9

The Cultivation of Trees Split Search

– Which splits are to be considered? Splitting Criterion

– Which split is best? Stopping Rule

– When should the splitting stop? Pruning Rule

– Should some branches be lopped off?

10

How is the best split determined? In some situations, the worth of a split is obvious. If the expected target is the same in the child nodes as in the parent node, no improvement was made, and the split is worthless!

In contrast, if a split results in pure child nodes, the split is undisputedly best. For classification trees, the three most widely used splitting criteria are based on the Pearson chi-squared test, the Gini index, and entropy. All three measure the difference in class distributions across the child nodes. The three methods usually give similar results.

Splitting Criteria

11

Splitting Criteria

Left Right

Perfect Split

Debt-to-Income Ratio < 45

A CompetingThree-Way Split

4500 0

0 500

3196 1304

154 346

Not Bad

Bad

2521 1188

115 162

791

223

Left Center Right

4500

500

4500

500

4500

500

Not Bad

Bad

Not Bad

Bad

Decision Tree TypesBinary trees – only two choices in each split. Can be non-uniform (uneven) in depth

N-way trees or ternary trees – three or more choices in at least one of its splits (3-way, 4-way, etc.)

Split CriteriaThe best split is defined as one that does the best job of separating the data into groups where a single class predominates in each groupMeasure used to evaluate a potential split is purity The best split is one that increases purity of the sub-

sets by the greatest amount A good split also creates nodes of similar size or at

least does not create very small nodes

Tests for Choosing Best Split

Purity (Diversity) Measures:

Gini (population diversity)

Entropy (information gain)

Gini (Population Diversity)The Gini measure of a node is the sum of the squares of the proportions of the classes.

Root Node: 0.5^2 + 0.5^2 = 0.5 (even balance)

Leaf Nodes: 0.1^2 + 0.9^2 = 0.82 (close to pure)

Gini Score =0.5*.82+0.5*.82=.82 (close to pure)

Decision Tree Advantages

1. Easy to understand

2. Map nicely to a set of business rules

3. Applied to real problems

4. Make no prior assumptions about the data

5. Able to process both numerical and categorical data

17

Benefits of Trees

Interpretability

– tree-structured presentation

Mixed Measurement Scales Regression trees

Handling of Outliers

Handling of Missing Values

18

The Right-Sized Tree

Stunting

Pruning

19

Building and Interpreting Decision Trees Explore the types of decision tree models available in

Enterprise Miner. Build a decision tree model. Examine the model results and interpret these results. Choose a decision threshold theoretically and

empirically.

20

The Scenario Determine who should be approved for a home equity

loan. The target variable is a binary variable that indicates

whether an applicant eventually defaulted on the loan. The input variables are variables such as the amount

of the loan, amount due on the existing mortgage, the value of the property, and the number of recent credit inquiries.

21

The HMEQ data set contains baseline and loan performance information for 5,960 recent home equity loans. The target (BAD) is a binary variable that indicates if an applicant eventually defaulted or was seriously delinquent. This adverse outcome occurred in 1,189 cases (20%). For each applicant, 12 input variables were recorded.

Presume that every two dollars loaned eventually returns three dollars if the loan is paid off in full.

Accuracy Measures (Classification)

Misclassification error

Error = classifying a record as belonging to one class when it belongs to another class.

Error rate = percent of misclassified records out of the total records in the validation data

Confusion Matrix

201 1’s correctly classified as “1”

85 1’s incorrectly classified as “0”

25 0’s incorrectly classified as “1”

2689 0’s correctly classified as “0”

Actual Class 1 0

1 201 85

0 25 2689

Predicted Class

Classification Confusion Matrix

Error Rate

Overall error rate = (25+85)/3000 = 3.67% Accuracy = 1 – err = (201+2689) = 96.33% If multiple classes, error rate is: (sum of misclassified records)/(total records)

Actual Class 1 0

1 201 85

0 25 2689

Predicted Class


Cutoff for classificationMost DM algorithms classify via a 2-step process:

For each record,

1. Compute probability of belonging to class “1”

2. Compare to cutoff value, and classify accordingly

Default cutoff value is 0.50

If >= 0.50, classify as “1”

If < 0.50, classify as “0” Can use different cutoff values Typically, error rate is lowest for cutoff = 0.50

Cutoff Table

Actual Class Prob. of "1" Actual Class Prob. of "1"1 0.996 1 0.5061 0.988 0 0.4711 0.984 0 0.3371 0.980 1 0.2181 0.948 0 0.1991 0.889 0 0.1491 0.848 0 0.0480 0.762 0 0.0381 0.707 0 0.0251 0.681 0 0.0221 0.656 0 0.0160 0.622 0 0.004

If cutoff is 0.50: eleven records are classified as “1” If cutoff is 0.80: seven records are classified as “1”

Confusion Matrix for Different Cutoffs

0.25

Actual Class owner non-owner

owner 11 1

non-owner 4 8

0.75

Actual Class owner non-owner

owner 7 5

non-owner 1 11

Cut off Prob.Val. for Success (Updatable)


Predicted Class

Cut off Prob.Val. for Success (Updatable)


Predicted Class

196723

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43

I nt_ NODE_

161616161616 9161616161616 91616 9161616 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

I nt_ LEAF_

4 4 4 4 4 4 2 4 4 4 4 4 4 2 4 4 2 4 4 41111111111111111111111111111111111111111111111

I ntP_ DEFAULT1

1. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00001. 00000. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 82820. 8282

I ntP_ DEFAULT0

0. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 00000. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 17180. 1718

NomI _ DEFAULT1111111111111111111111111111111111111111111

I ntU_ DEFAULT

1111111111111111111111111111111111111111111

NomF_ DEFAULT1111111111111111111111010111111011111010111

I ntR_ DEFAULT1

0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 1718 0. 1718- 0. 8282 0. 1718- 0. 8282 0. 1718 0. 1718 0. 1718 0. 1718 0. 1718 0. 1718- 0. 8282 0. 1718 0. 1718 0. 1718 0. 1718 0. 1718- 0. 8282 0. 1718- 0. 8282 0. 1718 0. 1718 0. 1718

I ntR_ DEFAULT0

0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000 0. 0000- 0. 1718- 0. 1718 0. 8282- 0. 1718 0. 8282- 0. 1718- 0. 1718- 0. 1718- 0. 1718- 0. 1718- 0. 1718 0. 8282- 0. 1718- 0. 1718- 0. 1718- 0. 1718- 0. 1718 0. 8282- 0. 1718 0. 8282- 0. 1718- 0. 1718- 0. 1718

Nom_ WARN_

I ntDef aul t

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1

I ntLoan

24900 13400 65500 16800 24700 15500 6300 20600 20100 24000 30200 6500 11800 21700 5700 10300 11800 12000 15200 9700 40000 40000 12000 24800 41100 40000 20000 10000 10000 12000 17600 12000 10000 45000 15000 8500 8500 10000 47000 10000 6000 15000 6000

I ntMor t gage

62191 131524 205156 27623 79347 82054 12476 52946 16755 88783 80951 183860 74512 24984 74172 70147 67678 76345 105328 32660 4742 53543 88000 37200 94600 120000 . . 69727 42000 76043 87000 76700 47321 29000 48961 18240 34767 164411 32000 9660 45000 24600

I ntVal ue

83694 148356 290239 88231 108238 104627 32559 83558 29412 116967 116160 208910 93328 92297 79846 122124 108092 89036 113931 54536 . . 118750 67000 151000 159000 115750 65088 90312 60000 95605 101200 97800 115000 105000 73550 40200 51000 235500 59000 35900 68250 30500

NomReasonHomeI mpDebt ConDebt ConDebt ConDebt ConDebt ConHomeI mpDebt ConHomeI mpDebt ConDebt ConDebt ConHomeI mpDebt ConDebt ConHomeI mpHomeI mpHomeI mpHomeI mpDebt ConDebt ConDebt ConHomeI mpDebt ConDebt ConDebt ConHomeI mpHomeI mpHomeI mpDebt ConDebt ConDebt ConDebt ConDebt ConHomeI mpHomeI mpHomeI mpHomeI mpDebt ConDebt ConDebt ConDebt ConHomeI mp

44

Consequences of a Decision

Decision 1 Decision 0

Actual 1 True Positive False Negative

Actual 0 False Positive True Negative

45

Consequences of a Decision: Profit matrix (SAS EM)

Decision 1 Decision 0

Actual 1 True Positive

(Profit = $2)

False Negative

Actual 0 False Positive

(Loss = $1)

True Negative

46

Bayes Rule: Optimal threshold

positive false ofcost

negative false ofcost 1

1

Using the cost structure defined for the home equity example, the optimal threshold is 1/(1+(2/1)) = 1/3. That is,•reject all applications whose predicted probability of default exceeds 0.3333

predictive modeling using decision trees

Documents