![Page 1: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/1.jpg)
Dr Athanasios Tsanas (‘Thanasis’)
Oxford Centre for Industrial and Applied Mathematics Institute of Biomedical Engineering, Dept. of Engineering Science Sleep and Circadian Neuroscience Institute, Dept. of Medicine
![Page 2: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/2.jpg)
Tidal Generation: Rolls-Royce EMEC 500kW
![Page 3: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/3.jpg)
Tidal Generation: Rolls-Royce EMEC 500kW
Feature
generation
Feature
selection/transformation Statistical
mapping
1 1.041.021.01 1.03
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Time (s)
Am
plit
ude
bsignal amplitude pitch period
Characterise
signal
![Page 4: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/4.jpg)
Supervised learning setting
Samples feature 1 feature 2 ... feature M
S1 3.1 1.3 0.9
S2 3.7 1.0 1.3
S3 2.9 2.6 0.6
…
SN 1.7 2.0 0.7
Outcome
type 1
type 2
type 1
…
type 5 N (
sa
mp
les)
M (features or characteristics)
y
y = f (X)
X
![Page 5: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/5.jpg)
Thin and fat datasets
![Page 6: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/6.jpg)
Curse of dimensionality
Many features Μ Curse of dimensionality
![Page 7: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/7.jpg)
Solution to the problem
Reduce the initial feature space M
into m (or m<<M)
Feature selection
Feature transformation
![Page 8: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/8.jpg)
Feature transformation
Manifold embedding (e.g. PCA)
Not easily interpretable
![Page 9: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/9.jpg)
Feature selection
Minimal feature subset with maximal predictive power
Interpretable
![Page 10: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/10.jpg)
Reminder: supervised learning
Samples feature 1 feature 2 ... feature M
S1 3.1 1.3 0.9
S2 3.7 1.0 1.3
S3 2.9 2.6 0.6
…
SN 1.7 2.0 0.7
Outcome
type 1
type 2
type 1
…
type 5 N (
sa
mp
les)
M (features or characteristics)
y
y = f (X)
X
![Page 11: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/11.jpg)
Functional mapping
y = f (X)
Simple approaches (LDA, kNN…)
Support Vector Machines (SVM)
Ensembles
![Page 12: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/12.jpg)
Ensembles
Ensemble = combination of learners
Sequential VS Parallel
Learner1 …LearnerK Final
model
Learner1
…LearnerK
Final model
![Page 13: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/13.jpg)
Ensembles founding question
“Can a set of weak learners create
a single strong learner?”
![Page 14: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/14.jpg)
14
Bagging Boosting Adaptive boosting (Adaboost) (Schapire and Freund 1997)
Bootstrapping
A brief history of ensembles
Classifier design
Resampling data
![Page 15: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/15.jpg)
Decision trees
Powerful, conceptually simple learners
![Page 16: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/16.jpg)
Decision trees: an example
![Page 17: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/17.jpg)
Tree growing process I
Find best split & partitioning data into two sub-regions (nodes)
Exhaustive search
Determine the pairs of half-planes 𝑅1 𝑗, 𝑠 , 𝑅2 𝑗, 𝑠 :
𝑅1 𝑗, 𝑠 = 𝐗 𝐟𝑗 ≤ 𝑠
𝑅2 𝑗, 𝑠 = 𝐗 𝐟𝑗 > 𝑠
Stop when data in the node is below some threshold (min. node size)
![Page 18: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/18.jpg)
Tree growing process II
Optimal feature 𝐟𝑗 and splitting point 𝑠 according to loss function:
min𝑗,𝑠
min𝑐1
𝑦𝑖 − 𝑐12
𝑥𝑖 𝜖 𝑅1 𝑗,𝑠 +min𝑐2
𝑦𝑖 − 𝑐22
𝑥𝑖 𝜖 𝑅2 𝑗,𝑠
where 𝑐1, 𝑐2 are the mean values of the 𝑦𝑖 in the node when using
the sum of squares as the loss function
𝑐1 = 𝑚𝑒𝑎𝑛 𝑦𝑖 𝐱𝑖 𝜖 𝑅1 𝑗, 𝑠
𝑐2 = 𝑚𝑒𝑎𝑛 𝑦𝑖 𝐱𝑖 𝜖 𝑅2 𝑗, 𝑠
Equations differ depending on the loss function to be optimized
![Page 19: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/19.jpg)
Random Forests (RF)
Parallel combination of decision trees
Use many decision trees (typically 500)
Bagging
Each tree, each node accesses ( 𝑀)
features
Majority voting
![Page 20: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/20.jpg)
Some results
Comparing Logistic Regression (LR) with Random Forests (RF)
![Page 21: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/21.jpg)
Some results
Comparing Support Vector Machines with Random Forests on one
dataset
1 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of features
Mis
cla
ssific
atio
n (
SV
M)
Sonar
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
1 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of features
Mis
cla
ssific
atio
n (
RF
)
Sonar
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
![Page 22: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/22.jpg)
Final remarks on Random Forests
State of the art learner
Very robust to hyper-parameter selection (just use it
off-the-shelf!)
Bonus: provides feature importance scores
Weakness: cannot capture well linear relationships
(works in steps internally)
Powerful in multi-class classification settings
![Page 23: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/23.jpg)
Boosting and adaptive boosting
Boosting: reduce bias by
combining sequential
learners
Adaptive boosting: adjust
the weights of the samples
![Page 24: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/24.jpg)
Sequential ensemble: Adaboost
Train sequential learners
Introduce weights on misclassified samples
Sensitive to noise and outliers in the data
No overfitting claims; late results can overfit data
Can be used with different types of base learners
Convergence if all learners are better than chance
![Page 25: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/25.jpg)
Adaboost
Given 𝐱𝑖 , y𝑖 𝑖=1…𝑁, 𝐱𝑖 𝑀 and y𝑖 +1,−1
Initialize the weights 𝐃1, for each of the samples: 𝐃𝟏 𝑖 =1
𝑁
For 𝑡 = 1…𝑇 sequential weak learners
Find the learner 𝑓 𝐃𝒕 : min 𝐿 y𝑖 , 𝑦 𝑖 𝑖𝐃𝒕
Ensure that the weak learner is better than chance (0.5)
Update weights 𝐃𝒕+𝟏 𝑖 = 𝑓 𝐃𝒕 𝑖 , 𝑦 𝑖 ≠ y𝑖
Free parameter: influence of misclassified samples
![Page 26: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/26.jpg)
Summary of Adaboost
Easy to implement
Generic framework – can work with any
type of learner
Competitive with Random Forests – no
clear winner
Variants for robustness (see Matlab‟s
native function “fitensemble”)
![Page 27: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/27.jpg)
Adaboost versus RF
Instead of bootstrapping (RF), uses sample
reweighting
Instead of majority voting (RF), uses
weighted voting
Explicitly revisits samples misclassified by
previous weak learners
![Page 28: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/28.jpg)
Choosing algorithms
What kind of data do you have? Labeled data? Data
growing daily?
Often the biggest practical challenge is creating or
obtaining enough training data
For many problems and algorithms, hundreds or
thousands of examples from each class are required
to produce a high performance classifier
Number of samples, features, correlations,
interactions… use plots.
No free lunch theorem: no universally best learner!
![Page 29: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/29.jpg)
Thanasis’ rules of thumb
Get to know your data before any processing!
• (1) plot densities and scatter plots – univariate analysis and intuitive
“feel”, perhaps these suggest simple data transformation
• (2) compute correlations and correlation matrix, identify interactions
(e.g. via partial correlations and conditional mutual information)
• (3) Feature selection – a simple guide: if there are low correlations use
LASSO, if there are few (low) interactions use mRMR
• (4) In my experience, SVMs may work better for binary-class
classification problems, and RF may work better for multi-class
classification problems
I would strongly advise testing both SVM and RF (and possibly other
statistical mapping algorithms) in the problems you study
![Page 30: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/30.jpg)
Appendix: food for thought
A Unifying Review of Linear Gaussian Models
http://www.cs.nyu.edu/~roweis/papers/NC110201.pdf
an excellent paper by Sam Roweis and Zoubin Ghahramani unifying linear models
Statistical modelling: the two cultures
http://faculty.smu.edu/tfomby/eco5385/lecture/Breiman%27s%20Two
%20Cultures%20paper.pdf
Great introduction: first principles versus statistical modelling by Leo Breiman
Relations Between Machine Learning Problems
http://videolectures.net/nipsworkshops2011_williamson_machine/
Check out these lectures!
![Page 31: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/31.jpg)
Additional Slides
![Page 32: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/32.jpg)
Feature selection concepts
Relevance Redundancy
Complementarity
![Page 33: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/33.jpg)
LASSO (L1 regularization)
Start with classical ordinary least squares regression
L1 penalty: sparsity promoting, some coefficients become 0
![Page 34: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/34.jpg)
RELIEF
Concept: work with nearest neighbours
Nearest hit (NH) and nearest miss (NM)
Great for datasets with interactions but does not account for
information redundancy
![Page 35: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/35.jpg)
mRMR
minimum Redundancy Maximum Relevance (mRMR)
Generally works very well
![Page 36: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/36.jpg)
My new algorithm RRCT
RRCT ≝ max𝑖 ∈ 𝑄−𝑆
𝑟IT 𝐟𝑖; 𝐲𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑐𝑒
−1
𝑆 𝑟IT 𝐟𝑖; 𝐟𝑠s ∈ 𝑆
𝑟𝑒𝑑𝑢𝑛𝑑𝑎𝑛𝑐𝑦
+ 𝑠𝑖𝑔𝑛 𝑟p 𝐟𝑖; 𝐲|𝑆 ∙ 𝑠𝑖𝑔𝑛 𝑟p 𝐟𝑖; 𝐲|𝑆 − 𝑟 𝐟𝑖; 𝐲 ∙ 𝑟p,IT𝑐𝑜𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑖𝑡𝑦
https://www.youtube.com/watch?v=LJS7Igvk6ZM
The best result will
come when
everyone in the
group doing what is
best for himself…
and the group.
![Page 37: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/37.jpg)
𝑟p 𝑋, 𝑌|𝑍 =𝑟 𝑋, 𝑌 − 𝑟 𝑋, 𝑍 ∙ 𝑟 𝑌, 𝑍
𝑟2 𝑋, 𝑍 ∙ 𝑟2 𝑌, 𝑍
My new algorithm RRCT
𝐃 = −0.5 ∙ log
1 − 𝑟12 1 − 𝜌12
2 … 1 − 𝜌1𝛭2
1 − 𝜌122 1 − 𝑟2
2 ⋯ 1 − 𝜌2𝛭2
⋮ ⋮ ⋱ ⋮1 − 𝜌1𝛭
2 1 − 𝜌2𝛭2 … 1 − 𝑟𝑀
2
Relevance & Redundancy trade-off
Complementarity
![Page 38: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/38.jpg)
My new algorithm RRCT
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Correlation coefficient
Info
rmation c
onte
nt Asymptotically tending
to infinity
Normalizing pdfs of variables
Information theoretic transformation 𝑟𝐼𝑇 = −0.5 ∙ log 1 − 𝑟2
![Page 39: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/39.jpg)
Validation setting
Matching ‘true’ feature subset
o Possible only for artificial datasets
Maximize the out of sample prediction performance
o adds an additional „layer‟: the learner
o potential problem: feature exportability
o BUT… in practice this is really what is of most interest!
![Page 40: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/40.jpg)
Dataset info
Dataset Design matrix Associated task Type
MONK135 124×6 Classification (2 classes) D (6)
Artificial 1 500×150 Classification (2 classes) C(150)
Artificial 2 1000×100 Classification (10 classes) C(100)
Hepatitis 155×19 Classification (2 classes) C (17), D (2)
Parkinson’s35 195×22 Classification (2 classes) C (22)
Sonar35 208×60 Classification (2 classes) C (60)
Wine35 178×13 Classification (3 classes) C (13)
Image segmentation35 2310×19 Classification (7 classes) C (16), D (3)
Cardiotocography35 2129×21 Classification (10 classes) C (14), D (7)
Ovarian cancer 72×592 Classification (2 classes) C (592)
SRBCT 88×2308 Classification (4 classes) C (2308)
![Page 41: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/41.jpg)
False Discovery Rate (FDR)
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
Features
FD
R
Artificial1
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
1
2
3
4
5
6
7
8
9
10
11
12
Features
FD
R
Artificial2
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
Artificial datasets with known ground truth
![Page 42: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/42.jpg)
Indicative performance
Classifier accuracy as proxy for FS accuracy
1 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of features
Mis
cla
ssific
atio
n (
RF
)
Sonar
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
1 5 10 15 20 25 300
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of features
Mis
cla
ssific
atio
n (
SV
M)
Sonar
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
![Page 43: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/43.jpg)
Indicative performance
Classifier accuracy as proxy for FS accuracy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 190
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of features
Mis
cla
ssific
atio
n (
SV
M)
Image segmentation
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 190
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of features
Mis
cla
ssific
ation (
RF
)
Image segmentation
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
![Page 44: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/44.jpg)
Indicative performance
Classifier accuracy as proxy for FS accuracy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
0.1
0.2
0.3
0.4
0.5
0.6
Number of features
Mis
cla
ssific
atio
n (
SV
M)
Cardiotocography
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
1 5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
Number of features
Mis
cla
ssific
atio
n (
RF
)
Cardiotocography
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
![Page 45: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/45.jpg)
1 5 10 15 20 25 300
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of features
Mis
cla
ssific
atio
n (
RF
)
Ovarian cancer
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
1 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of features
Mis
cla
ssific
atio
n (
RF
)
SRBCT
LASSO
mRMR
mRMRSpearman
GSO
RELIEF
LLBFS
RRCT
Performance in fat datasets
Very challenging setting for many algorithms
![Page 46: Dr Athanasios Tsanas (‘Thanasis’)€¦ · Indicative performance Classifier accuracy as proxy for FS accuracy 1 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5](https://reader036.vdocuments.us/reader036/viewer/2022063012/5fc7b7562b0dca518d7ac055/html5/thumbnails/46.jpg)
Supervised learning setting
f : mapping X: Design matrix y: outcome
Sa
mp
les
f 1 f2 ... feature
M
S1 3.1 1.3 0.9
S2 3.7 1.0 1.3
S3 2.9 2.6 0.6
…
SN 1.7 2.0 0.7
Samples feature 1 feature 2 ... feature M
S1 3.1 1.3 0.9
S2 3.7 1.0 1.3
S3 2.9 2.6 0.6
…
SN 1.7 2.0 0.7