experimental design, multi-class classification, multi ... · option 1: develop a multi-class...
TRANSCRIPT
![Page 1: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/1.jpg)
Experimental Setup,Multi-class vs. Multi-label classification,
andEvaluation
CMSC 678
UMBC
![Page 2: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/2.jpg)
Central Question: How Well Are We Doing?
Classification
Regression
Clustering
the task: what kindof problem are you
solving?
• Precision, Recall, F1
• Accuracy• Log-loss• ROC-AUC• …
• (Root) Mean Square Error• Mean Absolute Error• …
• Mutual Information• V-score• …
![Page 3: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/3.jpg)
Central Question: How Well Are We Doing?
Classification
Regression
Clustering
the task: what kindof problem are you
solving?
• Precision, Recall, F1
• Accuracy• Log-loss• ROC-AUC• …
• (Root) Mean Square Error• Mean Absolute Error• …
• Mutual Information• V-score• …
This does not have to be the same thing as the
loss function
you optimize
![Page 4: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/4.jpg)
Outline
Experimental Design: Rule 1
Multi-class vs. Multi-label classification
Evaluation
Regression Metrics
Classification Metrics
![Page 5: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/5.jpg)
Experimenting with Machine Learning Models
All your data
Training DataDevData
Test Data
![Page 6: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/6.jpg)
Rule #1
![Page 7: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/7.jpg)
Experimenting with Machine Learning Models
What is “correct?”
What is working “well?”
Training DataDevData
Test Data
Learn model parameters from training set
set hyper-parameters
![Page 8: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/8.jpg)
Experimenting with Machine Learning Models
What is “correct?”
What is working “well?”
Training DataDevData
Test Data
set hyper-parameters
Evaluate the learned model on devwith that hyperparameter setting
Learn model parameters from training set
![Page 9: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/9.jpg)
Experimenting with Machine Learning Models
What is “correct?”
What is working “well?”
Training DataDevData
Test Data
set hyper-parameters
Evaluate the learned model on devwith that hyperparameter setting
Learn model parameters from training set
perform final evaluation on test,using the hyperparameters that optimized dev performance and
retraining the model
![Page 10: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/10.jpg)
Experimenting with Machine Learning Models
What is “correct?”
What is working “well?”
Training DataDevData
Test Data
set hyper-parameters
Evaluate the learned model on devwith that hyperparameter setting
Learn model parameters from training set
perform final evaluation on test,using the hyperparameters that optimized dev performance and
retraining the model
Rule 1: DO NOT ITERATE ON THE TEST DATA
![Page 11: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/11.jpg)
On-board Exercise
Produce dev and test tables for a linear regression model with learned weights and
set/fixed (non-learned) bias
![Page 12: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/12.jpg)
Outline
Experimental Design: Rule 1
Multi-class vs. Multi-label classification
Evaluation
Regression Metrics
Classification Metrics
![Page 13: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/13.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
Multi-label Classification
![Page 14: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/14.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a
binary classification task
Multi-label Classification
![Page 15: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/15.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a
binary classification task
If 𝑦 ∈ {0,1, … , 𝐾 − 1} (for finite K), then a multi-class
classification task
Q: What are some examples of multi-class classification?
Multi-label Classification
![Page 16: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/16.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a
binary classification task
If 𝑦 ∈ {0,1, … , 𝐾 − 1} (for finite K), then a multi-class
classification task
Q: What are some examples of multi-class classification?
A: Many possibilities. See A2, Q{1,2,4-7}
Multi-label Classification
![Page 17: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/17.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a
binary classification task
If 𝑦 ∈ {0,1, … , 𝐾 − 1} (for finite K), then a multi-class
classification task
Multi-label Classification
Single
output
Multi-
output
If multiple 𝑦𝑙 are predicted, then a multi-label classification task
![Page 18: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/18.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a
binary classification task
If 𝑦 ∈ {0,1, … , 𝐾 − 1} (for finite K), then a multi-class
classification task
Multi-label Classification
Single
output
Multi-
output
Given input 𝑥, predict multiple discrete labels 𝑦 = (𝑦1, … , 𝑦𝐿)
If multiple 𝑦𝑙 are predicted, then a multi-label classification task
![Page 19: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/19.jpg)
Multi-class Classification
Given input 𝑥, predict discrete label 𝑦
If 𝑦 ∈ {0,1} (or 𝑦 ∈{True, False}), then a
binary classification task
If 𝑦 ∈ {0,1, … , 𝐾 − 1} (for finite K), then a multi-class
classification task
Multi-label Classification
Single
output
Multi-
output
Given input 𝑥, predict multiple discrete labels 𝑦 = (𝑦1, … , 𝑦𝐿)
If multiple 𝑦𝑙 are predicted, then a multi-label classification task
Each 𝑦𝑙 could be binary or multi-class
![Page 20: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/20.jpg)
Multi-Label Classification…
Will not be a primary focus of this course
Many of the single output classification methods apply to multi-label classification
Predicting “in the wild” can be trickier
Evaluation can be trickier
![Page 21: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/21.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
![Page 22: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/22.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
Loss function may (or may not) need to be extended & the model structure may need to change (big or small)
![Page 23: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/23.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
Loss function may (or may not) need to be extended & the model structure may need to change (big or small)
Common change:instead of a single weight vector 𝑤, keep a weight vector 𝑤(𝑐) for each class c
Compute class specific scores, e.g., 𝑦𝑖(𝑐)
= 𝑤(𝑐) 𝑇𝑥 + 𝑏(𝑐)
![Page 24: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/24.jpg)
Multi-class Option 1: Linear Regression/Perceptron
𝐰
𝑥
𝑦
𝑦 = 𝐰𝑇𝑥 + 𝑏
output:if y > 0: class 1
else: class 2
![Page 25: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/25.jpg)
Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View
𝐰
𝑥
𝑦
𝑦 = 𝐰𝑇𝑥 + 𝑏
𝐰𝟐
𝑥𝑦
𝑦1 = 𝐰𝟏𝑇𝑥 + 𝑏1
𝐰𝟏
𝑦2
𝑦2 = 𝐰𝟐𝑇𝑥 + 𝑏2
𝑦1
output:if y > 0: class 1
else: class 2
output:i = argmax {y1, y2}
class i
binary version is special case
![Page 26: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/26.jpg)
Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View (alternative)
𝐰
𝑥
𝑦
𝑦 = 𝐰𝑇𝑥 + 𝑏
𝐰𝟐
𝑥𝑦
𝑦1 = 𝒘𝟏; 𝒘𝟐𝑻[𝑥; 𝟎] + 𝑏1
𝐰𝟏
𝑦2
𝑦1
output:if y > 0: class 1
else: class 2
output:i = argmax {y1, y2}
class i
𝑦2 = 𝒘𝟏; 𝒘𝟐𝑻[𝟎; 𝑥] + 𝑏2
concatenation
Q: (For discussion) Why does this work?
![Page 27: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/27.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
With C classes:
Train C different binary classifiers 𝛾𝑐(𝑥)
𝛾𝑐(𝑥) predicts 1 if x is likely class c, 0 otherwise
![Page 28: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/28.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
With C classes:
Train C different binary classifiers 𝛾𝑐(𝑥)
𝛾𝑐(𝑥) predicts 1 if x is likely class c, 0 otherwise
To test/predict a new instance z:Get scores 𝑠𝑐 = 𝛾𝑐(𝑧)
Output the max of these scores, ො𝑦 = argmax𝑐 𝑠
𝑐
![Page 29: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/29.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
With C classes:
Train 𝐶2
different binary classifiers 𝛾𝑐1,𝑐2(𝑥)
![Page 30: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/30.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
With C classes:
Train 𝐶2
different binary classifiers 𝛾𝑐1,𝑐2(𝑥)
𝛾𝑐1,𝑐2(𝑥) predicts 1 if x is likely class 𝑐1, 0 otherwise (likely class 𝑐2)
![Page 31: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/31.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
With C classes:
Train 𝐶2
different binary classifiers 𝛾𝑐1,𝑐2(𝑥)
𝛾𝑐1,𝑐2(𝑥) predicts 1 if x is likely class 𝑐1, 0 otherwise (likely class 𝑐2)
To test/predict a new instance z:Get scores or predictions 𝑠𝑐1,𝑐2 =𝛾𝑐1,𝑐2 𝑧
![Page 32: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/32.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
With C classes:
Train 𝐶2
different binary classifiers 𝛾𝑐1,𝑐2(𝑥)
𝛾𝑐1,𝑐2(𝑥) predicts 1 if x is likely class 𝑐1, 0 otherwise (likely class 𝑐2)
To test/predict a new instance z:Get scores or predictions 𝑠𝑐1,𝑐2 =𝛾𝑐1,𝑐2 𝑧Multiple options for final prediction:
(1) count # times a class c was predicted(2) margin-based approach
![Page 33: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/33.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
Q: (to discuss)
Why might you want to use option 1 or options
OvA/AvA?
What are the benefits of OvA vs. AvA?
![Page 34: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/34.jpg)
We’ve only developed binary classifiers so far…
Option 1: Develop a multi-class version
Option 2: Build a one-vs-all (OvA) classifier
Option 3: Build an all-vs-all (AvA) classifier
(there can be others)
Q: (to discuss)
Why might you want to use option 1 or options
OvA/AvA?
What are the benefits of OvA vs. AvA?
What if you start with a balanced dataset, e.g.,
100 instances per class?
![Page 35: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/35.jpg)
Outline
Experimental Design: Rule 1
Multi-class vs. Multi-label classification
Evaluation
Regression Metrics
Classification Metrics
![Page 36: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/36.jpg)
Regression Metrics
(Root) Mean Square Error
𝑅𝑀𝑆𝐸 =1
𝑁
𝑖
𝑁
𝑦𝑖 − ෝ𝑦𝑖2
![Page 37: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/37.jpg)
Regression Metrics
(Root) Mean Square Error Mean Absolute Error
𝑅𝑀𝑆𝐸 =1
𝑁
𝑖
𝑁
𝑦𝑖 − ෝ𝑦𝑖2 𝑀𝐴𝐸 =
1
𝑁
𝑖
𝑁
|𝑦𝑖 − ෝ𝑦𝑖|
![Page 38: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/38.jpg)
Regression Metrics
(Root) Mean Square Error Mean Absolute Error
𝑅𝑀𝑆𝐸 =1
𝑁
𝑖
𝑁
𝑦𝑖 − ෝ𝑦𝑖2 𝑀𝐴𝐸 =
1
𝑁
𝑖
𝑁
|𝑦𝑖 − ෝ𝑦𝑖|
Q: How can these reward/punish predictions
differently?
![Page 39: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/39.jpg)
Regression Metrics
(Root) Mean Square Error Mean Absolute Error
𝑅𝑀𝑆𝐸 =1
𝑁
𝑖
𝑁
𝑦𝑖 − ෝ𝑦𝑖2 𝑀𝐴𝐸 =
1
𝑁
𝑖
𝑁
|𝑦𝑖 − ෝ𝑦𝑖|
Q: How can these reward/punish predictions
differently?
A: RMSE punishes outlier predictions more harshly
![Page 40: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/40.jpg)
Outline
Experimental Design: Rule 1
Multi-class vs. Multi-label classification
Evaluation
Regression Metrics
Classification Metrics
![Page 41: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/41.jpg)
Training Loss vs. Evaluation Score
In training, compute loss to update parameters
Sometimes loss is a computational compromise
- surrogate loss
The loss you use might not be as informative as you’d like
Binary classification: 90 of 100 training examples are +1, 10 of 100 are -1
![Page 42: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/42.jpg)
Some Classification Metrics
Accuracy
PrecisionRecall
AUC (Area Under Curve)
F1
Confusion Matrix
![Page 43: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/43.jpg)
Classification Evaluation:the 2-by-2 contingency table
ActuallyCorrect
Actually Incorrect
Selected/Guessed
Not selected/not guessed
Classes/Choices
![Page 44: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/44.jpg)
Classification Evaluation:the 2-by-2 contingency table
ActuallyCorrect
Actually Incorrect
Selected/Guessed
True Positive (TP)
Not selected/not guessed
Classes/Choices
Correct Guessed
![Page 45: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/45.jpg)
Classification Evaluation:the 2-by-2 contingency table
ActuallyCorrect
Actually Incorrect
Selected/Guessed
True Positive (TP)
False Positive (FP)
Not selected/not guessed
Classes/Choices
Correct Guessed Correct Guessed
![Page 46: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/46.jpg)
Classification Evaluation:the 2-by-2 contingency table
ActuallyCorrect
Actually Incorrect
Selected/Guessed
True Positive (TP)
False Positive (FP)
Not selected/not guessed
False Negative (FN)
Classes/Choices
Correct Guessed Correct Guessed
Correct Guessed
![Page 47: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/47.jpg)
Classification Evaluation:the 2-by-2 contingency table
ActuallyCorrect
Actually Incorrect
Selected/Guessed
True Positive (TP)
False Positive (FP)
Not selected/not guessed
False Negative (FN)
True Negative (TN)
Classes/Choices
Correct Guessed Correct Guessed
Correct Guessed Correct Guessed
![Page 48: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/48.jpg)
Classification Evaluation:Accuracy, Precision, and Recall
Accuracy: % of items correct
Actually Correct Actually Incorrect
Selected/Guessed True Positive (TP) False Positive (FP)
Not select/not guessed False Negative (FN) True Negative (TN)
TP + TN
TP + FP + FN + TN
![Page 49: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/49.jpg)
Classification Evaluation:Accuracy, Precision, and Recall
Accuracy: % of items correct
Precision: % of selected items that are correct
Actually Correct Actually Incorrect
Selected/Guessed True Positive (TP) False Positive (FP)
Not select/not guessed False Negative (FN) True Negative (TN)
TP
TP + FP
TP + TN
TP + FP + FN + TN
![Page 50: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/50.jpg)
Classification Evaluation:Accuracy, Precision, and Recall
Accuracy: % of items correct
Precision: % of selected items that are correct
Recall: % of correct items that are selected
Actually Correct Actually Incorrect
Selected/Guessed True Positive (TP) False Positive (FP)
Not select/not guessed False Negative (FN) True Negative (TN)
TP
TP + FP
TP
TP + FN
TP + TN
TP + FP + FN + TN
![Page 51: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/51.jpg)
Classification Evaluation:Accuracy, Precision, and Recall
Accuracy: % of items correct
Precision: % of selected items that are correct
Recall: % of correct items that are selected
Actually Correct Actually Incorrect
Selected/Guessed True Positive (TP) False Positive (FP)
Not select/not guessed False Negative (FN) True Negative (TN)
TP
TP + FP
TP
TP + FN
TP + TN
TP + FP + FN + TN
Min: 0 ☹️Max: 1 😀
![Page 52: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/52.jpg)
Precision and Recall Present a Tradeoff
precision
recall0
0
1
1
Q: Where do you want your ideal
model?model
![Page 53: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/53.jpg)
Precision and Recall Present a Tradeoff
precision
recall0
0
1
1
Q: Where do you want your ideal
model?model
Q: You have a model that always identifies correct instances. Where on this graph is it?
model
![Page 54: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/54.jpg)
Precision and Recall Present a Tradeoff
precision
recall0
0
1
1
Q: You have a model that always identifies correct instances. Where on this graph is it?
model
Q: You have a model that only
make correct predictions. Where on this graph is it?
model
Q: Where do you want your ideal
model?model
![Page 55: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/55.jpg)
Precision and Recall Present a Tradeoff
precision
recall0
0
1
1
Q: You have a model that always identifies correct instances. Where on this graph is it?
model
Q: You have a model that only
make correct predictions. Where on this graph is it?
model
Q: Where do you want your ideal
model?model
![Page 56: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/56.jpg)
Precision and Recall Present a Tradeoff
precision
recall0
0
1
1
Q: You have a model that always identifies correct instances. Where on this graph is it?
model
Q: You have a model that only
make correct predictions. Where on this graph is it?
model
Q: Where do you want your ideal
model?model
Idea: measure the tradeoff between
precision and recall
Remember those hyperparameters: Each
point is a differently trained/tuned model
![Page 57: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/57.jpg)
Precision and Recall Present a Tradeoff
precision
recall0
0
1
1
Q: You have a model that always identifies correct instances. Where on this graph is it?
model
Q: You have a model that only
make correct predictions. Where on this graph is it?
model
Q: Where do you want your ideal
model?model
Idea: measure the tradeoff between
precision and recall
Improve overall model: push the curve that way
![Page 58: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/58.jpg)
Measure this Tradeoff:Area Under the Curve (AUC)
AUC measures the area under this tradeoff curve
pre
cisi
on
recall0
0
1
1
Improve overall model: push the curve that way
Min AUC: 0 ☹️Max AUC: 1 😀
![Page 59: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/59.jpg)
Measure this Tradeoff:Area Under the Curve (AUC)
AUC measures the area under this tradeoff curve
1. Computing the curve
You need true labels & predicted labels with some score/confidence estimate
Threshold the scores and for each threshold compute precision and recall
pre
cisi
on
recall0
0
1
1
Improve overall model: push the curve that way
Min AUC: 0 ☹️Max AUC: 1 😀
![Page 60: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/60.jpg)
Measure this Tradeoff:Area Under the Curve (AUC)
AUC measures the area under this tradeoff curve
1. Computing the curveYou need true labels & predicted labels with some score/confidence estimate
Threshold the scores and for each threshold compute precision and recall
2. Finding the areaHow to implement: trapezoidal rule (& others)
In practice: external library like the sklearn.metrics module
pre
cisi
on
recall0
0
1
1
Improve overall model: push the curve that way
Min AUC: 0 ☹️Max AUC: 1 😀
![Page 61: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/61.jpg)
Measure A Slightly Different Tradeoff:ROC-AUC
AUC measures the area under this tradeoff curve
1. Computing the curveYou need true labels & predicted labels with some score/confidence estimateThreshold the scores and for each threshold compute metrics
2. Finding the areaHow to implement: trapezoidal rule (& others)
In practice: external library like the sklearn.metrics module
Tru
e p
osi
tive
rat
e
False positive rate0
0
1
1
Improve overall model: push the curve that way
Min ROC-AUC: 0.5 ☹️Max ROC-AUC: 1 😀
Main variant: ROC-AUCSame idea as before but with some
flipped metrics
![Page 62: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/62.jpg)
A combined measure: F
Weighted (harmonic) average of Precision & Recall
𝐹 =1
𝛼1𝑃+ (1 − 𝛼)
1𝑅
![Page 63: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/63.jpg)
A combined measure: F
Weighted (harmonic) average of Precision & Recall
𝐹 =1
𝛼1𝑃+ (1 − 𝛼)
1𝑅
=1 + 𝛽2 ∗ 𝑃 ∗ 𝑅
(𝛽2 ∗ 𝑃) + 𝑅
algebra (not important)
![Page 64: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/64.jpg)
A combined measure: F
Weighted (harmonic) average of Precision & Recall
Balanced F1 measure: β=1
𝐹 =1 + 𝛽2 ∗ 𝑃 ∗ 𝑅
(𝛽2 ∗ 𝑃) + 𝑅
𝐹1 =2 ∗ 𝑃 ∗ 𝑅
𝑃 + 𝑅
![Page 65: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/65.jpg)
P/R/F in a Multi-class Setting:Micro- vs. Macro-Averaging
If we have more than one class, how do we combine multiple performance measures into one quantity?
Macroaveraging: Compute performance for each class, then average.
Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.
Sec. 15.2.4
![Page 66: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/66.jpg)
P/R/F in a Multi-class Setting:Micro- vs. Macro-Averaging
Macroaveraging: Compute performance for each class, then average.
Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.
Sec. 15.2.4
microprecision =σcTPc
σcTPc +σc FPc
macroprecision =
𝑐
TPcTPc + FPc
=
𝑐
precision𝑐
![Page 67: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/67.jpg)
P/R/F in a Multi-class Setting:Micro- vs. Macro-Averaging
Macroaveraging: Compute performance for each class, then average.
Microaveraging: Collect decisions for all classes, compute contingency table, evaluate.
Sec. 15.2.4
microprecision =σcTPc
σc TPc + σc FPc
macroprecision =
𝑐
TPcTPc + FPc
=
𝑐
precision𝑐
when to prefer the macroaverage?
when to prefer the microaverage?
![Page 68: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/68.jpg)
Micro- vs. Macro-Averaging: Example
Truth: yes
Truth: no
Classifier: yes
10 10
Classifier: no
10 970
Truth: yes
Truth: no
Classifier: yes
90 10
Classifier: no
10 890
Truth: yes
Truth: no
Classifier: yes
100 20
Classifier: no
20 1860
Class 1 Class 2 Micro Ave. Table
Sec. 15.2.4
Macroaveraged precision: (0.5 + 0.9)/2 = 0.7
Microaveraged precision: 100/120 = .83
Microaveraged score is dominated by score on frequent classes
![Page 69: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/69.jpg)
Confusion Matrix: Generalizing the 2-by-2 contingency table
Correct Value
Guessed Value
# # #
# # #
# # #
![Page 70: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/70.jpg)
Confusion Matrix: Generalizing the 2-by-2 contingency table
Correct Value
Guessed Value
80 9 11
7 86 7
2 8 9
Q: Is this a good result?
![Page 71: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/71.jpg)
Confusion Matrix: Generalizing the 2-by-2 contingency table
Correct Value
Guessed Value
30 40 30
25 30 50
30 35 35
Q: Is this a good result?
![Page 72: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/72.jpg)
Confusion Matrix: Generalizing the 2-by-2 contingency table
Correct Value
Guessed Value
7 3 90
4 8 88
3 7 90
Q: Is this a good result?
![Page 73: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/73.jpg)
Some Classification Metrics
Accuracy
PrecisionRecall
AUC (Area Under Curve)
F1
Confusion Matrix
![Page 74: Experimental Design, Multi-class classification, Multi ... · Option 1: Develop a multi-class version Option 2: Build a one-vs-all (OvA) classifier Option 3: Build an all-vs-all (AvA)](https://reader034.vdocuments.us/reader034/viewer/2022050117/5f4e0233558aa1349f08f130/html5/thumbnails/74.jpg)
Outline
Experimental Design: Rule 1
Multi-class vs. Multi-label classification
Evaluation
Regression Metrics
Classification Metrics