1 decision tree learning chapter 3 decision tree representation id3 learning algorithm entropy,...
TRANSCRIPT
![Page 1: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/1.jpg)
1
Decision Tree Learning
Chapter 3
• Decision tree representation• ID3 learning algorithm• Entropy, information gain• Overfitting
![Page 2: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/2.jpg)
Review example: Image Categorization (two phases)
Training LabelsTraining
Images
Classifier Training
Training
Image Features
Trained Classifier
Image Features
Testing
Test Image
Trained Classifier Outdoor
Prediction
![Page 3: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/3.jpg)
Inductive Learning
• Learning a function from examples
Occam’s Razor: prefer the simplest hypothesis consistent with dataOne of the most widely used inductive learnings: Decision Tree Learning
![Page 4: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/4.jpg)
4
Color
ShapeSize +
+- Size
+-
+
big
big small
small
roundsquare
redgreen blue
I
Decision Tree Example
+ : Filled with blue- : Filled with red
• Each internal node corresponds to a test• Each branch corresponds to a result of the test• Each leaf node assigns a classification
![Page 5: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/5.jpg)
PlayTennis: Training Examples
Sample, attribute, target_attribute
![Page 6: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/6.jpg)
A Decision Tree for the concept PlayTennis
Outlook?
Humidity? Wind?
SunnyOvercast
Rain
High Normal Strong Weak
• Finding the most suitable attribute for the root• Finding redundant attributes, like temperature
![Page 7: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/7.jpg)
Convert a tree to rule
![Page 8: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/8.jpg)
Decision trees can represent any Boolean function
If (O=Sunny AND H=Normal) OR (O=Overcast) OR (O=Rain AND W=Weak) then YES
• “A disjunction of conjunctions of constraints on attribute values”
![Page 9: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/9.jpg)
Decision trees can represent any Boolean function
• In the worst case, it need exponentially many nodes• XOR, as an extreme case
![Page 10: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/10.jpg)
Decision tree, decision boundaries
![Page 11: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/11.jpg)
Decision Regions
![Page 12: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/12.jpg)
Decision Trees
• One of the most widely used and practical methods for inductive inference
• Approximates discrete-valued functions – can be extended to continuous valued functions
• Can be used for classification (most common) or regression problems
![Page 13: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/13.jpg)
Decision Trees for Regression(Continuous values)
![Page 14: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/14.jpg)
Decision tree learning algorithm
• Learning process: finding the tree from the training set
• For a given training set, there are many trees that code it without any error
• Finding the smallest tree is NP-complete (Quinlan 1986), hence we are forced to use some (local) search algorithm to find reasonable solutions
![Page 15: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/15.jpg)
ID3: The basic decision tree learning algorithm
• Basic idea: A decision tree can be constructed by considering attributes of instances one by one.
– Which attribute should be considered first?
– The height of a decision tree depends on the order attributes that are considered.
==> Entropy
![Page 16: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/16.jpg)
17
Which Attribute is ”best”?
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-]A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
Entropy:Large Entropy => more information
![Page 17: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/17.jpg)
18
Entropy
Entropy(S) = -p+ log2 p+ - p- log2 p-
• S is training examples• p+ is the proportion of positive examples
• p- is the proportion of negative examples
• Exercise: • Calculate the Entropy in two cases (Note that: 0Log20 =0):
– P+ = 0.5, P- = 0.5– P+ = 1, P- = 0
Question: Why (-) in the equation?
![Page 18: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/18.jpg)
Entropy
A measure of uncertainty
Example:- Probability of Head in a fair coin- Same for a coin with two head
Information theory
آنتروپی زیاد=عدم قطعیت زیاد=بیشتر تصادفی بودن پدیده=اطالعات بیشتر= نیاز به تعداد بیت های بیشتر برای کد کردن
![Page 19: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/19.jpg)
Entropy
• For multi-class problems with c categories, entropy generalizes to:
• Q: Why log2?
c
iii ppSEntropy
12 )(log)(
![Page 20: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/20.jpg)
21
Information Gain
• Gain(S,A): expected reduction in entropy due to sorting S on attribute A
• where Sv is the subset of S having value v for attribute A
)()(),()(
vAValuesv
v SEntropyS
SSEntropyASGain
![Page 21: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/21.jpg)
22
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-] A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
Entropy(S) = ?, Gain(S,A1)=?, Gain(S,A2)=?
Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99
Information Gain)()(),(
)(v
AValuesv
v SEntropyS
SSEntropyASGain
c
iii ppSEntropy
12 )(log)(
![Page 22: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/22.jpg)
Information Gain
A1=?
True False
[21+, 5-] [8+, 30-]
[29+,35-]
Entropy([21+,5-]) = 0.71
Entropy([8+,30-]) = 0.74
Gain(S,A1)=Entropy(S)
-26/64*Entropy([21+,5-])
-38/64*Entropy([8+,30-])
=0.27
A1 is higher in the tree
Entropy([18+,33-]) = 0.94
Entropy([11+,2-]) = 0.62
Gain(S,A2)=Entropy(S)
-
51/64*Entropy([18+,33-])
-
13/64*Entropy([11+,2-])
=0.11
A2=?
True False
[18+, 33-] [11+, 2-]
[29+,35-]
)()(),()(
vAValuesv
v SEntropyS
SSEntropyASGain
![Page 23: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/23.jpg)
ID3 for the playTennis example
![Page 24: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/24.jpg)
ID3: The Basic Decision Tree Learning Algorithm
• What is the “best” attribute?• [“best” = with highest information gain]
• Answer: Outlook
![Page 25: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/25.jpg)
26
ID3 (Cont’d)
D13
D12D11
D10
D9D4
D7D5
D3D14
D8 D6
D2
D1
OutlookSunny
OvercastRain
What are the “best” next attributes? Humidity Wind
![Page 26: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/26.jpg)
PlayTennis Decision Tree
Outlook?
Humidity? Wind?
SunnyOvercast
Rain
High Normal Strong Weak
![Page 27: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/27.jpg)
Stopping criteria
• each leaf-node contains examples of one type, or
• algorithm ran out of attributes
![Page 28: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/28.jpg)
ID3
![Page 29: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/29.jpg)
Over fitting in Decision Trees
• Why “over”-fitting?– A model can become more complex than the true target
function (concept) when it tries to satisfy noisy data as well.
hypothesis complexity
accu
racy
on training data
on test data
![Page 30: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/30.jpg)
Over fitting in Decision Trees
![Page 31: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/31.jpg)
Overfitting Example
32
voltage (V)
curr
ent (
I)
Testing Ohms Law: V = IR
Ohm was wrong, we have found a more accurate function!
Perfect fit to training data with an 9th degree polynomial(can fit n points exactly with an n-1 degree polynomial)
Experimentallymeasure 10 points
Fit a curve to theResulting data.
![Page 32: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/32.jpg)
Overfitting Example
33
voltage (V)
curr
ent (
I)
Testing Ohms Law: V = IR
Better generalization with a linear functionthat fits training data less accurately.
![Page 33: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/33.jpg)
Avoiding over-fitting the data
• How can we avoid overfitting? There are 2 approaches:
1. Early stopping: stop growing the tree before it perfectly classifies the training data
2. Pruning: grow full tree, then prune
• Reduced error pruning
• Rule post-pruning
– Pruning approach is found more useful in practice.
![Page 34: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/34.jpg)
Other issues in Decision tree learning
• Incorporating continuous valued attributes
• Alternative measures for selecting attributes
• Handling training examples with missing attribute value
• Handling attributes with different costs
![Page 35: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/35.jpg)
Strengths and Advantages of Decision Trees
• Rule extraction from trees– A decision tree can be used for feature extraction
(e.g. seeing which features are useful)
• Interpretability: human experts may verify and/or discover patterns
• It is a compact and fast classification method
![Page 36: 1 Decision Tree Learning Chapter 3 Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c0151a28abf838ccdf72/html5/thumbnails/36.jpg)
Your Assignments
• HW1 is uploaded, Due date: 94/08/14
• Proposal: same due date– One page maximum
– Include the following information:• Project title
• Data set
• Project idea. approximately two paragraphs.
• Software you will need to write.
• Papers to read. Include 1-3 relevant papers.
• Teammate (if any) and work division. We expect projects done in a group to be more substantial than projects done individually.