classification of affective states - gp semi-supervised learning, svm and knn
DESCRIPTION
Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN. MAS 622J Course Project. Hyungil Ahn ([email protected]). Objective & Dataset. Recognize the affective states of a child solving a puzzle Affective Dataset - 1024 features from Face, Posture, Game - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/1.jpg)
Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN
MAS 622J Course Project
Hyungil Ahn ([email protected])
![Page 2: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/2.jpg)
Objective & Dataset
• Recognize the affective states of a child solving a puzzle
• Affective Dataset
- 1024 features from Face, Posture, Game- 3 affective states, labels annotated by teachers
High interest (61), Low interest (59), Refreshing (16)
![Page 3: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/3.jpg)
Task & Approaches
Binary Classification
High interest (61 samples) vs.
Low Interest or Refreshing (75 samples)
Approaches
- Semi-Supervised Learning: Gaussian Process (GP)
- Support Vector Machine
- k-Nearest Neighbor (k = 1)
![Page 4: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/4.jpg)
GP Semi-Supervised Learning
Given , predict the labels of unlabeled pts
Assume the data, data generation process X : inputs, y : vector of labels,
t : vector of hidden soft labels, Each label (binary classification) Final classifier y = sign[ t ] = sign [ ] Define
Infer given
Similarity function
![Page 5: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/5.jpg)
Bayesian Model
: Prior of the classifier
: Likelihood of the classifier given the labeled data
GP Semi-Supervised Learning
Infer given
![Page 6: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/6.jpg)
GP Semi-Supervised Learning
How to model the prior & the likelihood ?
The prior : Using GP,
(Soft labels vary smoothly across the data manifold!)
The likelihood :
![Page 7: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/7.jpg)
GP Semi-Supervised Learning
EP (Expectation Propagation) approximating the posterior as a Gaussian
Select hyperparameter { kernel width σ, labeling error rate ε }
that maximizes evidence !
Advantage of using EP we get the evidence
as a side product
EP estimates the leave-one-out predictive performance
without performing any expensive cross-validation.
![Page 8: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/8.jpg)
Support Vector Machine
OSU SVM toolbox RBF kernel : Hyperparameter {C, σ} Selection Use leave-
one-out validation !
![Page 9: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/9.jpg)
kNN (k = 1)
The label of test point follows that of its nearest point
This algorithm is simple to implement and
the accuracy of this algorithm can be used as a base line.
However, sometimes this algorithm gives a good result !
![Page 10: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/10.jpg)
Split of the dataset & Experiment
GP Semi-supervised learning
- randomly select labeled data (p % of overall data), use the remaining data as unlabeled data, predict the labels of unlabeled data (In this setting, unlabeled data == test data)
- 50 tries for each p (p = 10, 20, 30, 40, 50)
- Each time select the hyperparameter that maximizes the evidence from EP
SVM and kNN
- randomly select train data (p % of overall data), use the remaining data as test data, predict the labels of test data
- 50 tries for each p (p = 10, 20, 30, 40, 50)
- In the SVM, leave-one-out validation for hyperparameter selection was achieved by using the train data
![Page 11: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/11.jpg)
GP – evidence & accuracy
0 1 2 3 4 5 672
74
76
78
80
82
84
86
88
Sigma (Hyperparameter)
Rec
ogni
tion
Acc
urac
y /
Log
Evi
denc
eRec Accuracy (Unlabeled)
Log Evidence
The case of Percentage of train points per class = 50 % (average over 10 tries)(Note) An offset was added to log evidence to plot all curves in the same figure.
Max of Rec Accuracy ≈ Max of Log Evidence Find the optimal hyperparameter by using evidence from EP
![Page 12: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/12.jpg)
SVM – hyperparameter selection
Log (C)
Log (1/ )
Evidence from Leave-one-out validation
Select the hyperparameter {C, sigma} that maximizes the evidence from leave-one-out validation !
![Page 13: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/13.jpg)
Classification Accuracy
0 10 20 30 40 50 6060
65
70
75
80
85
90
95
100
Percent of labeled(or train) points per class
Per
cent
age
of r
ecog
nitio
n on
unl
abel
ed(o
r te
st)
poin
ts GP
kNN(k=1)SVM
As expected, kNN is bad at small # of train pts and better at large # of train pts
SVM has good accuracy even when the # of train pts is small, why?GP has bad accuracy when the # of train pts is large, why?
![Page 14: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/14.jpg)
Analysis-SVM
The best things I can tell…1. {# Support Vectors} / {# of Train Points} is high in this task, in particular when the percentage of train points is low.
The support vectors decide the decision boundary. But it is not guaranteed that the SV ratio is highly related with the test accuracy.
Actually it is known that {Leave-one-out CV error} is less than {# Support Vectors} / {# of Train Points}.
2. CV accuracy rate is high even when the # of train pts is small. CV accuracy rate is very related with Test accuracy rate.
Why does SVM give a good test accuracy even when the number of train points is small ?
0 10 20 30 40 50 600
10
20
30
40
50
60
70
80
Percent of train points per class
Num
ber
of s
uppo
rt v
ecto
rs,
Num
ber
of t
rain
poi
nts
# of SVs
# of train points
0 10 20 30 40 50 6040
50
60
70
80
90
100
Percent of train points per classC
V a
ccur
acy
rate
, T
est
accu
racy
rat
e, #
SV
s /
# tr
ain
pts
CV accuracy rate
Test accuracy rate# SVs / # train pts
![Page 15: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/15.jpg)
Analysis-GP
0 1 2 3 4 5 672
74
76
78
80
82
84
86
88
Sigma (Hyperparameter)
Rec
ogni
tion
Acc
urac
y /
Log
Evi
denc
e
Rec Accuracy (Unlabeled)
Log Evidence
Why does GP give a bad test accuracy when the number of train points is small ?
0 1 2 3 4 5 660
62
64
66
68
70
72
74
Sigma (Hyperparameter)
Rec
ogni
tion
Acc
urac
y /
Log
Evi
denc
e
Rec Accuracy (Unlabeled)
Log Evidence
Percentage of train points per class = 50 %
Max of Rec Accuracy ≈ Max of Log Evidence
Percentage of train points per class = 10 %
Log Evidence curve is flat fail to find optimal Sigma !
![Page 16: Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN](https://reader035.vdocuments.us/reader035/viewer/2022080916/56812c47550346895d90cb12/html5/thumbnails/16.jpg)
Conclusion
GP Small number of train points bad accuracy Large number of train points good accuracy
SVM Regardless of the number of train points good accuracy
kNN (k = 1) Small number of train points bad accuracy Large number of train points good accuracy