decision trees with minimal costs (icml 2004, banff, canada) charles x. ling, univ of western...
Post on 21-Dec-2015
214 views
TRANSCRIPT
![Page 1: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/1.jpg)
Decision Trees with Minimal CostsDecision Trees with Minimal Costs(ICML 2004, Banff, Canada)(ICML 2004, Banff, Canada)
Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong KongJianning Wang, Univ of Western Ontario, CanadaShichao Zhang, UTS, Australia
Contact: [email protected]
![Page 2: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/2.jpg)
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
![Page 3: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/3.jpg)
Costs in Machine LearningCosts in Machine Learning
Most inductive learning algorithms: minimizing classification errors– Different types of misclassification have
different costs, e.g. FP and FN
In this talk: – Test costs should also be considered– Cost sensitive learning considers a variety of
costs; see survey by Peter Turney (2000)
![Page 4: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/4.jpg)
ApplicationsApplications
Medical Practice– Doctors may ask a patient to go through a
number of tests (e.g., Blood tests, X-rays)– Which of these new tests will bring about
higher value?
Biological Experimental Design– When testing a new drug, new tests are costly– which experiments to perform?
![Page 5: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/5.jpg)
Previous WorkPrevious WorkMany previous works consider the two types
of cost separately – an obvious oversight(Turney 1995): ICET, uses genetic algorithm
to build trees to minimize the total cost(Zubek and Dieterrich 2002): a Markov
Decision Process (MDP), searches in a state space for optimal policies
(Greiner et al. 2002): PAC learning
![Page 6: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/6.jpg)
An Example of Our ProblemAn Example of Our Problem
Training: with ?, cannot obtain valuesIDC1
FeverC2
X-rayC3
Blood_1C4
Blood_2C5
… D
12 101 ? H ? … Yes
23 ? L M L … No
Test: with many ?, may obtain values at a costIDC1
FeverC2
X-rayC3
Blood_1C4
Blood_2C5
… D
45 98 ? ? ? … ?
58 ? ? ? ? … ?
Goal 1: build a tree that minimizes
the total cost
Goal 2: obtain test values at a cost to minimize the total
cost
![Page 7: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/7.jpg)
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
![Page 8: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/8.jpg)
Building Trees with Minimal Total CostsBuilding Trees with Minimal Total Costs
Assumption: binary classes, costs: FP and FNGoal: minimize total cost
– Total cost = misclassification cost + test cost
Previous Work– Information Gain as a attribute selection criterion
In this work, need a new attribute selection criterion
![Page 9: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/9.jpg)
Attribute Selection Criterion: C4.5Attribute Selection Criterion: C4.5
Minimal total cost (C4.5: minimal entropy)– If growing a tree has a smaller total cost
then choose an attribute with minimal total costelse stop and form a leaf
![Page 10: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/10.jpg)
Label leaf according to minimal total costIf (P×FN N×FP)
then class = positiveelse class = negative
![Page 11: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/11.jpg)
First, how to handle ? values in training data
Previous work – built ? branch; – problematic
This work– deal with unknown values in the training set:– no branch for ? will be built, – examples are “gathered” inside the internal
nodes
Difference on Difference on ?? values values
![Page 12: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/12.jpg)
Desirable PropertiesDesirable Properties
1. Effect of difference between misclassification costs and the test costs
P N P N P P
A1
All test costs are 20
All test costs are 300
P
P P P P
A1
A6 A6
P N P NN N
All test costs are 0
![Page 13: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/13.jpg)
2. Prefer attribute with smaller test costs
A1 A2 A3 A4 A5 A6
# 1 20 20 20 20 20 20
# 2 200 20 100 100 200 200
# 3 200 100 100 100 20 200
P P P P
A1
A6 A6
P N P NN N
P N
A2
A1
P N N PP P
P P
A5
A1
P N N PP P
![Page 14: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/14.jpg)
3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree
Cost of A1=20
P P P P
A1
A6 A6
P N P NN N
Cost of A1=50
P N
A6
A1
N PN P
P
Cost of A1=80
P N
A6
A2
P NN
P
![Page 15: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/15.jpg)
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
![Page 16: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/16.jpg)
Missing values in test casesMissing values in test cases
Blood test X-ray result
Urine test S-test
? good ? ?
A New patient arrives:
![Page 17: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/17.jpg)
OST: IntuitionOST: Intuition
Explain the intuition of OST here
![Page 18: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/18.jpg)
Four Testing StrategiesFour Testing Strategies
First: Optimal Sequential Test (OST)(Simple batch test: do all tests)
Second: No test will be performed, predict with internal node
Third: No test will be performed, predict with weighted sum of subtrees
Fourth: A new tree is built dynamically for each test case using only the known attributes
P P P P
A1
A6 A6
P N P NN N
P N P N P P
A1
![Page 19: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/19.jpg)
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
![Page 20: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/20.jpg)
Experiment - settingsExperiment - settings
Five dataset, binary-class60/40 for training/testing, repeat 5 timesUnknown values for training/test examples are
selected randomly by a specific probability Also compare to C4.5 tree, using OST for testing
![Page 21: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/21.jpg)
Results with different % of unknownResults with different % of unknown
0
20
40
60
80
100
120
140
160
20 40 60 80
P ercentage of unknown attributes
M1 (OST)
M2
M3
M4
C4.5
No test, internal
C4.5 tree, OST
No test, lazy tree
No test, distributed
OST is best; M4 and C4.5 next; M3 is worst OST not increase with more ?; others do overall
![Page 22: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/22.jpg)
0
100
200
300
400
500
600
50 100 200 400
Test costs
M1 (OST)
M2
M3
M4
C4.5
Results with different test costsResults with different test costs
No test, internal
C4.5 tree, OST
No test, lazy tree
No test, distributed
With large test costs, OST = M2 = M3 = M4 C4.5 is much worse (tree building is cost-insensitive)
![Page 23: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/23.jpg)
0
100
200
300
400
500
600
50 100 200 400Test costs
M1 (OST)
M2
M3
M4
C4.5
Results with unbalanced class costsResults with unbalanced class costs
No test, internal
C4.5 tree, OST
No test, lazy tree
No test, distributed
With large test costs, OST = M2 = M4 C4.5 is much worse (tree building is cost-insensitive) M3 is worse than M2… (M3 is used in C4.5)
![Page 24: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/24.jpg)
Comparing OST/C4.5 cross 6 datasetsComparing OST/C4.5 cross 6 datasets
OST always outperforms C4.5
00.10.20.30.40.50.60.70.80.9
20 40 60 80
(a) P ercentage of unknown attributes
Ecoli Breast Heart Thyroid Australia
0
0.2
0.4
0.6
0.8
1
50 100 200 400
(b) Test costs
Ecoli Breast Heart Thyroid Australia
![Page 25: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/25.jpg)
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
![Page 26: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/26.jpg)
ConclusionsConclusions
New tree building algorithm for minimal costs– Desirable properties – Computationally efficient (similar to C4.5)
Test strategies (OST and batch) are very effective
Can solve many real-world diagnosis problems
![Page 27: Decision Trees with Minimal Costs (ICML 2004, Banff, Canada) Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong Kong Jianning Wang,](https://reader034.vdocuments.us/reader034/viewer/2022051619/56649d6c5503460f94a4cbd4/html5/thumbnails/27.jpg)
Future WorkFuture Work
More intelligent “Batch Test” methodsConsider cost of additional batch test
– Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …
Other learning algorithms with minimal total cost
A wrapper that works for any “black box”