classification of affective states - gp semi-supervised learning, svm and knn

16
Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN MAS 622J Course Project Hyungil Ahn ([email protected])

Upload: chadwick-branch

Post on 30-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN. MAS 622J Course Project. Hyungil Ahn ([email protected]). Objective & Dataset. Recognize the affective states of a child solving a puzzle Affective Dataset - 1024 features from Face, Posture, Game - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN

MAS 622J Course Project

Hyungil Ahn ([email protected])

Page 2: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Objective & Dataset

• Recognize the affective states of a child solving a puzzle

• Affective Dataset

- 1024 features from Face, Posture, Game- 3 affective states, labels annotated by teachers

High interest (61), Low interest (59), Refreshing (16)

Page 3: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Task & Approaches

Binary Classification

High interest (61 samples) vs.

Low Interest or Refreshing (75 samples)

Approaches

- Semi-Supervised Learning: Gaussian Process (GP)

- Support Vector Machine

- k-Nearest Neighbor (k = 1)

Page 4: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

GP Semi-Supervised Learning

Given , predict the labels of unlabeled pts

Assume the data, data generation process X : inputs, y : vector of labels,

t : vector of hidden soft labels, Each label (binary classification) Final classifier y = sign[ t ] = sign [ ] Define

Infer given

Similarity function

Page 5: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Bayesian Model

: Prior of the classifier

: Likelihood of the classifier given the labeled data

GP Semi-Supervised Learning

Infer given

Page 6: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

GP Semi-Supervised Learning

How to model the prior & the likelihood ?

The prior : Using GP,

(Soft labels vary smoothly across the data manifold!)

The likelihood :

Page 7: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

GP Semi-Supervised Learning

EP (Expectation Propagation) approximating the posterior as a Gaussian

Select hyperparameter { kernel width σ, labeling error rate ε }

that maximizes evidence !

Advantage of using EP we get the evidence

as a side product

EP estimates the leave-one-out predictive performance

without performing any expensive cross-validation.

Page 8: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Support Vector Machine

OSU SVM toolbox RBF kernel : Hyperparameter {C, σ} Selection Use leave-

one-out validation !

Page 9: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

kNN (k = 1)

The label of test point follows that of its nearest point

This algorithm is simple to implement and

the accuracy of this algorithm can be used as a base line.

However, sometimes this algorithm gives a good result !

Page 10: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Split of the dataset & Experiment

GP Semi-supervised learning

- randomly select labeled data (p % of overall data), use the remaining data as unlabeled data, predict the labels of unlabeled data (In this setting, unlabeled data == test data)

- 50 tries for each p (p = 10, 20, 30, 40, 50)

- Each time select the hyperparameter that maximizes the evidence from EP

SVM and kNN

- randomly select train data (p % of overall data), use the remaining data as test data, predict the labels of test data

- 50 tries for each p (p = 10, 20, 30, 40, 50)

- In the SVM, leave-one-out validation for hyperparameter selection was achieved by using the train data

Page 11: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

GP – evidence & accuracy

0 1 2 3 4 5 672

74

76

78

80

82

84

86

88

Sigma (Hyperparameter)

Rec

ogni

tion

Acc

urac

y /

Log

Evi

denc

eRec Accuracy (Unlabeled)

Log Evidence

The case of Percentage of train points per class = 50 % (average over 10 tries)(Note) An offset was added to log evidence to plot all curves in the same figure.

Max of Rec Accuracy ≈ Max of Log Evidence Find the optimal hyperparameter by using evidence from EP

Page 12: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

SVM – hyperparameter selection

Log (C)

Log (1/ )

Evidence from Leave-one-out validation

Select the hyperparameter {C, sigma} that maximizes the evidence from leave-one-out validation !

Page 13: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Classification Accuracy

0 10 20 30 40 50 6060

65

70

75

80

85

90

95

100

Percent of labeled(or train) points per class

Per

cent

age

of r

ecog

nitio

n on

unl

abel

ed(o

r te

st)

poin

ts GP

kNN(k=1)SVM

As expected, kNN is bad at small # of train pts and better at large # of train pts

SVM has good accuracy even when the # of train pts is small, why?GP has bad accuracy when the # of train pts is large, why?

Page 14: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Analysis-SVM

The best things I can tell…1. {# Support Vectors} / {# of Train Points} is high in this task, in particular when the percentage of train points is low.

The support vectors decide the decision boundary. But it is not guaranteed that the SV ratio is highly related with the test accuracy.

Actually it is known that {Leave-one-out CV error} is less than {# Support Vectors} / {# of Train Points}.

2. CV accuracy rate is high even when the # of train pts is small. CV accuracy rate is very related with Test accuracy rate.

Why does SVM give a good test accuracy even when the number of train points is small ?

0 10 20 30 40 50 600

10

20

30

40

50

60

70

80

Percent of train points per class

Num

ber

of s

uppo

rt v

ecto

rs,

Num

ber

of t

rain

poi

nts

# of SVs

# of train points

0 10 20 30 40 50 6040

50

60

70

80

90

100

Percent of train points per classC

V a

ccur

acy

rate

, T

est

accu

racy

rat

e, #

SV

s /

# tr

ain

pts

CV accuracy rate

Test accuracy rate# SVs / # train pts

Page 15: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Analysis-GP

0 1 2 3 4 5 672

74

76

78

80

82

84

86

88

Sigma (Hyperparameter)

Rec

ogni

tion

Acc

urac

y /

Log

Evi

denc

e

Rec Accuracy (Unlabeled)

Log Evidence

Why does GP give a bad test accuracy when the number of train points is small ?

0 1 2 3 4 5 660

62

64

66

68

70

72

74

Sigma (Hyperparameter)

Rec

ogni

tion

Acc

urac

y /

Log

Evi

denc

e

Rec Accuracy (Unlabeled)

Log Evidence

Percentage of train points per class = 50 %

Max of Rec Accuracy ≈ Max of Log Evidence

Percentage of train points per class = 10 %

Log Evidence curve is flat fail to find optimal Sigma !

Page 16: Classification of Affective States    -  GP Semi-Supervised Learning, SVM and kNN

Conclusion

GP Small number of train points bad accuracy Large number of train points good accuracy

SVM Regardless of the number of train points good accuracy

kNN (k = 1) Small number of train points bad accuracy Large number of train points good accuracy