lecture 7. outline 1. overview of classification and decision tree 2. algorithm to build decision...
Post on 31-Dec-2015
215 Views
Preview:
TRANSCRIPT
Introducing Textural Data Visualization to Students in Computational Mathematics Major
Classification, Decision Tree Lecture 71Outline Overview of Classification and Decision Tree Algorithm to build Decision TreeFormula to measure information Weka, data preparation and Visualization
21. Illustration of the Classification TaskCourtesy to Professor David Mease for Next 10 slides
LearningAlgorithm
Model3Classification: Definition
Given a collection of records (training set)Each record contains a set of attributes (x), with one additional attribute which is the class (y).
Find a model to predict the class as a function of the values of other attributes.
Goal: previously unseen records should be assigned a class as accurately as possible.A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
4Classification Examples
Classifying credit card transactions as legitimate or fraudulent
Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil
Categorizing news stories as finance, weather, entertainment, sports, etc
Predicting tumor cells as benign or malignant
5Classification Techniques
There are many techniques/algorithms for carrying out classification
In this chapter we will study only decision trees
In Chapter 5 we will study other techniques, including some very modern and effective techniques
6An Example of a Decision Tree
categoricalcategoricalcontinuousclassRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80KSplitting AttributesTraining DataModel: Decision Tree7Applying the Tree Model to Predict the Class for a New ObservationRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80K
Test DataStart from the root of tree.8Applying the Tree Model to Predict the Class for a New ObservationRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80K
Test Data9Applying the Tree Model to Predict the Class for a New ObservationRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80K
Test Data10Applying the Tree Model to Predict the Class for a New ObservationRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80K
Test Data11Applying the Tree Model to Predict the Class for a New ObservationRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80K
Test Data12Applying the Tree Model to Predict the Class for a New ObservationRefundMarStTaxIncYESNONONOYesNoMarried Single, Divorced< 80K> 80K
Test DataAssign Cheat to No13DECISION TREE CHARACTERISTICSEasy to understand -- similar to human decision processDeals with both discrete and continuous featuresSimple, nonparametric classifierNo assumptions regarding probability distribution typesNP-complete Algorithms are computationally inexpensivecan represent arbitrarily complex decision boundariesOverfitting can be a problem
142. Algorithm to Build Decision TreesDefined recursivelySelect attribute as root nodeUse a greedy algorithm to create branches for each possible value of a selected attribute Repeat recursively until Either running out of instances or attributes Or reach a predefined thresholds of purityUse only branches that are reached by instances15WEATHER EXAMPLE, 9 yes/5 no
16Sunny
OvercastRainyTemp Hum Wind Play
Hot high FALSE NoHot high TRUE NoMild high FALSE NoCool normal FALSE YesMild normal TRUE YesTem Hum Wind Play
Hot high FALSE YesCool High TRUE YesMild high TRUE Yes Hot normal FALSE YesTemp Hum Wind Play
Mild high FALSE YesCool normal FALSE YesCool normal TURE Nomild normal FALSE YesMild high TRUE No17DECISION TREES: Weather ExampleHotMildCool18DECISION TREES: Weather ExampleHighNormal19DECISION TREES: Weather ExampleTrueFalse203. Information MeasuresSelecting attribute upon which to splitNeed measure of purity/informationEntropy
Gini Index
Classification error
21A Graphical Comparison
22Entropy
Measures purity similar to Gini
Used in C4.5
After the entropy is computed in each node, the overall value of the entropy is computed as the weighted average of the entropy in each node as with the Gini index
The decrease in Entropy is called information gain (page 160)
23Entropy Examples for a Single Node
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1Entropy = 0 log 0 1 log 1 = 0 0 = 0 P(C1) = 1/6 P(C2) = 5/6Entropy = (1/6) log2 (1/6) (5/6) log2 (5/6) = 0.65P(C1) = 2/6 P(C2) = 4/6Entropy = (2/6) log2 (2/6) (4/6) log2 (4/6) = 0.92243. Entropy: Calculating InformationAll three measures are consistent with each otherWill use entropy as example, The less purity, the more bits, the less info, Outlook as root:Info[2, 3] = 0.971 bitsInfo[4, 0] = 0.0 bitsInfo[3, 2] = 0.971 bitsTotal info = 5/14*0.971 + 4/14*0.0 + 5/14* 0.971 = 0.693 bits
253. Selecting Root AttributeInitial info = info[9, 5] = 0.940 bitsGain(outlook) = 0.940 0.693 = 0.247 bitsGain( temperature) = 0.029 bitsGain(humidity) = 0.152 bitsGain(windy) = 0.048 bitsSo, select outlook as root for splitting
26Sunny
OvercastRainyHigh
Normal
FALSE
TRUE
27Sunny
OvercastRainyHigh
Normal
FALSE
TRUE
Contradicted Training example 283. Hunts AlgorithmMany algorithms use a version of a top-down or divide-and-conquer approach known as Hunts Algorithm (Page 152):Let Dt be the set of training records that reach a node tIf Dt contains records that belong the same class yt, then t is a leaf node labeled as ytIf Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.
29An Example of Hunts Algorithm
Dont CheatRefundDont CheatDont CheatYesNoRefundDont CheatYesNoMaritalStatusDont CheatCheatSingle,DivorcedMarriedTaxableIncomeDont Cheat< 80K>= 80KRefundDont CheatYesNoMaritalStatusDont CheatCheatSingle,DivorcedMarried
30How to Apply Hunts Algorithm
Usually it is done in a greedy fashion.
Greedy means that the optimal split is chosen at each stage according to some criterion.
This may not be optimal at the end even for the same criterion.
However, the greedy approach is computational efficient so it is popular.
31Using the greedy approach we still have to decide 3 things:#1) What attribute test conditions to consider#2) What criterion to use to select the best split#3) When to stop splitting
For #1 we will consider only binary splits for both numeric and categorical predictors as discussed on the next slide
For #2 we will consider misclassification error, Gini index and entropy
#3 is a subtle business involving model selection. It is tricky because we dont want to overfit or underfit.32 Misclassification error is usually our final metric which we want to minimize on the test set, so there is a logical argument for using it as the split criterion
It is simply the fraction of total cases misclassified
1 - Misclassification error = Accuracy (page 149)
Misclassification Error33Gini Index
This is commonly used in many algorithms like CART and the rpart() function in R
After the Gini index is computed in each node, the overall value of the Gini index is computed as the weighted average of the Gini index in each node
34Gini Examples for a Single Node
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1Gini = 1 P(C1)2 P(C2)2 = 1 0 1 = 0
P(C1) = 1/6 P(C2) = 5/6Gini = 1 (1/6)2 (5/6)2 = 0.278P(C1) = 2/6 P(C2) = 4/6Gini = 1 (2/6)2 (4/6)2 = 0.44435
The Gini index decreases from .42 to .343 while the misclassification error stays at 30%. This illustrates why we often want to use a surrogate loss function like the Gini index even if we really only care about misclassification.A?YesNoNode N1Node N2
Gini(N1) = 1 (3/3)2 (0/3)2 = 0 Gini(Children) = 3/10 * 0 + 7/10 * 0.49= 0.343Gini(N2) = 1 (4/7)2 (3/7)2 = 0.490Misclassification Error Vs. Gini Index36
5. Discretization of Numeric Data
37Learning algorithmInductionDeductionTest SetModelTraining SetTidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
RefundMarital
StatusTaxable
IncomeCheat
NoMarried80K?
10
RefundMarital
StatusTaxable
IncomeCheat
NoMarried80K?
10
RefundMarital
StatusTaxable
IncomeCheat
NoMarried80K?
10
RefundMarital
StatusTaxable
IncomeCheat
NoMarried80K?
10
RefundMarital
StatusTaxable
IncomeCheat
NoMarried80K?
10
RefundMarital
StatusTaxable
IncomeCheat
NoMarried80K?
10
Sheet1ID CodeOutlookTemperatureHumidityWindyPlayasunnyhothighFALSEnobsunnyhothighTRUEnocovercasthothighFALSEyesdrainymildhighFALSEyeserainycoolnormalFALSEyesfrainycoolnormalTRUEnogovercastcoolnormalTRUEyeshsunnymildhighFALSEnoisunnycoolnormalFALSEyesjrainymildnormalFALSEyesksunnymildnormalTRUEyeslovercastmildhighTRUEyesmovercasthotnormalFALSEyesnrainymildhighTRUEno
C10
C26
C12
C24
C11
C25
TidRefundMarital
StatusTaxable
IncomeCheat
1YesSingle125KNo
2NoMarried100KNo
3NoSingle70KNo
4YesMarried120KNo
5NoDivorced95KYes
6NoMarried60KNo
7YesDivorced220KNo
8NoSingle85KYes
9NoMarried75KNo
10NoSingle90KYes
10
C10
C26
C12
C24
C11
C25
Parent
C17
C23
Gini = 0.42
N1N2
C134
C203
Gini=0.361
top related