![Page 1: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/1.jpg)
1 1 Slide
Slide
Evaluation
![Page 2: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/2.jpg)
2 2 Slide
Slide
Interactive decision tree construction
• Load segmentchallenge.arff; look at dataset
• Select UserClassifier (tree classifier)
• Use the test set segmenttest.arff
• Examine data visualizer and tree visualizer
• Plot regioncentroidrow vs intensitymean
• Rectangle, Polygon and Polyline selection tools
… several selections …
• Right click in Tree visualizer and Accept the tree
Over to you: how well can you do?
Be a classifier!
![Page 3: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/3.jpg)
3 3 Slide
Slide
Build a tree: what strategy did you use?
Given enough time, you could produce a “perfect”
tree for the dataset
• but would it perform well on the test test?
Be a classifier!
![Page 4: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/4.jpg)
4 4 Slide
Slide
Testdata
Trainingdata
MLalgorithm
Classifier Deploy!
Evaluationresults
Training and Testing
![Page 5: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/5.jpg)
5 5 Slide
Slide
Testdata
Trainingdata
MLalgorithm
Classifier Deploy!
Evaluationresults
sets produced byBasic assumption: training and testindependent sampling from an infinite population
Training and Testing
![Page 6: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/6.jpg)
6 6 Slide
Slide
Use J48 to analyze the segment dataset
• Open file segment‐challenge.arff
• Choose J48 decision tree learner (trees>J48)
• Supplied test set segment‐test.arff
• Run it: 96% accuracy
• Evaluate on training set: 99% accuracy
• Evaluate on percentage split: 95% accuracy
• Do it again: get exactly the same result!
Training and Testing
![Page 7: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/7.jpg)
7 7 Slide
Slide
Basic assumption:
• training and test sets sampled independently
from an infinite population
Just one dataset? — hold some out for testing
Expect slight variation in results… but Weka
produces same results each time…Why?
• E.g. J48 on segment‐challenge dataset
Training and Testing
![Page 8: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/8.jpg)
8 8 Slide
Slide
Evaluate J48 on segment‐challenge
• With segment‐challenge and J48 (trees>J48)
• Set percentage split to 90%
• Run it: 96.7% accuracy
• [More options] Repeat
with a different ith seed
• Use 2, 3, 4, 5, 6, 7, 8, 9, 10
Repeated Training and Testing
0.967
0.9400.9400.9670.9530.9670.9200.947
0.9330.947
![Page 9: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/9.jpg)
9 9 Slide
Slide
0.967
0.9400.9400.9670.9530.9670.9200.9470.9330.947
x iSample mean x =n
(xi – x )2Variance 2 =
n – 1
Standard deviation
x = 0.949, = 0.0158
Repeated Training and Testing
Evaluate J48 on segment‐challenge
![Page 10: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/10.jpg)
10 10 Slide
Slide
Basic assumption:
• training and test sets sampled independently
from an infinite population
Expect slight variation in results … get it by
setting the random‐number seed
Can calculate mean and standard deviation
experimentally
Repeated Training and Testing
![Page 11: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/11.jpg)
11 11 Slide
Slide
Use diabetes dataset and default holdout Open file diabetes.arff Test option: Percentage split Try these classifiers:
• trees > J48 76%• bayes > NaiveBayes 77%• lazy > IBk 73%• rules > PART 74%
768 instances (500 negative, 268 positive) Always guess “negative”: 500/768=65%
• rules > ZeroR: most likely class!
Baseline Accuracy
![Page 12: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/12.jpg)
12 12 Slide
Slide
Sometimes baseline is best!• Open supermarket.arff and blindly apply
• rules > ZeroR 64%• trees > J48 63%• bayes > NaiveBayes 63%• lazy > IBk 38%• rules > PART 63%
• Attributes are not informative
• Caution: Don’t just apply Weka to a dataset:
you need to understand what’s going on
Baseline Accuracy
![Page 13: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/13.jpg)
13 13 Slide
Slide
Consider whether differences are significant
Always try a simple baseline, e.g. rules > ZeroR
Caution: Don’t just apply Weka to a dataset: you
need to understand what’s going on
Baseline Accuracy
![Page 14: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/14.jpg)
14 14 Slide
Slide
Can we improve upon repeated holdout (i.e.
reduce variance)?
Cross‐validation
Stratified cross‐validation
Cross-Validation
![Page 15: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/15.jpg)
15 15 Slide
Slide
Repeated holdouthold out 10% for testing, repeat 10 times
(repeat 10 times)
Cross-Validation
![Page 16: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/16.jpg)
16 16 Slide
Slide
10‐fold cross‐validation
Divide dataset into 10 parts
Hold out each part in turnAverage the results
(folds)
Each data point used once for testing, 9 times for training
Stratified cross‐validation
Ensure that each fold has the rightproportion of each class value
Cross-Validation
![Page 17: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/17.jpg)
17 17 Slide
Slide
Cross‐validation better than repeated holdout
Stratified is even better
Practical rule of thumb:Lots of data? – use percentage splitElse stratified 10‐fold cross‐validation
Cross-Validation
![Page 18: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/18.jpg)
18 18 Slide
Slide
Is cross‐validation really better than repeated holdout?
Diabetes dataset
Baseline accuracy (rules > ZeroR):
trees > J4810‐fold cross‐validation
65.1%
73.8%
… with1
73.8
different random number seed2
75.0
3
75.5
4
75.5
5
74.4
6
75.6
7
73.6
8
74.0
9
74.5
10
73.0
Cross-Validation Results
![Page 19: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/19.jpg)
19 19 Slide
Slide
holdout(10%)
75.377.980.574.071.470.179.271.480.567.5
cross‐validation(10‐fold)
73.875.075.575.574.475.673.674.074.573.0
xi Sample mean x =n
(xi – x )2Variance 2 =
n – 1
Standard deviation
x = 74.5x = 74.8 = = 4.6 0.9
Cross-Validation Results
![Page 20: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/20.jpg)
20 20 Slide
Slide
Why 10‐fold? E.g. 20‐fold: 75.1%
Cross‐validation really is better than repeated holdout
It reduces the variance of the estimate
Cross-Validation Results
![Page 21: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/21.jpg)
21 21 Slide
Slide
Evaluation MethodsExercises
![Page 22: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/22.jpg)
22 22 Slide
Slide
Plan
To evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.
![Page 23: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/23.jpg)
23 23 Slide
Slide
Classification on Tic-Tac-Toe
Download Tic-Tac-Toe dataset tic-tac-toe.zip from Course Page.
Work as a team to evaluate the performance of machine learning algorithms classifying Tic-Tac-Toe games.
![Page 24: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/24.jpg)
24 24 Slide
Slide
Evaluation Methods
Using Training Set (use 100% of instances to train/learn and use 100% of instances to test performance)
10-fold Cross-Validation
Split 70% (use 70% of instances to train/learn and use the rest of 30% of instances to test performance)
![Page 25: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/25.jpg)
25 25 Slide
Slide
Classifiers Being Used Decision Tree
• Tree → J48 Neural Network
• Functions → MultilayerPerceptron (trainingtime=50)
Bayes Network• Bayes → NaiveBayes
Nearest Neighbor• Lazy → IBk (k=3)
![Page 26: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/26.jpg)
26 26 Slide
Slide
Using Weka
Extract Tic-Tac-Toe.zip to the Weka folder Load Weka program Open the Tic-Tac-Toe.arff Choose Explorer
![Page 27: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/27.jpg)
27 27 Slide
Slide
Using Weka (cont.)
Click Classify tab Choose J48 Classifier below trees Set the Test options to Use training set Enable Output predictions in More options Click Start to run
![Page 28: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/28.jpg)
28 28 Slide
Slide
Using Weka (cont.)
Accuracy rate
![Page 29: 1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset](https://reader033.vdocuments.us/reader033/viewer/2022051620/56649e955503460f94b9932d/html5/thumbnails/29.jpg)
29 29 Slide
Slide
Reporting Download Tic-tac-toe-report.docx Complete the table evaluating the performance of
different learning methods in Q1. Find the best performer in Q2, Q3, and Q4.