sparse wavelet representations and classiﬁcationsparse wavelet representations and classiﬁcation...

Sparse Wavelet Representations andClassification

Challenge Kaggle in class

DREEM

March 5th 2015

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 1/10

Introduction

Introduction

I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.

I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.

I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.

I Skewed data set: only 7% of positive exs.I AUC metric.


Introduction

Introduction




I Skewed data set: only 7% of positive exs.

I AUC metric.


Introduction

Introduction




I Skewed data set: only 7% of positive exs.I AUC metric.


Introduction

The data

The AUC metric

Classification algorithms we triedScatNet + AUC gradient descent (Herschtal, Raskutti,ICML 2004ScatNet + Mixture of Probabilistic PrincipalComponent Analysis (MPPCA) (Tipping, Bishop,1999)Neural NetworksIdea (cheat ?) use subjects IDs

Results and Conclusion


The data

The data

I Times series

I Proportion of each subject in the training setI Positive rate for each subject in the training set


The data

The dataI Times seriesI Proportion of each subject in the training set

I Positive rate for each subject in the training set


The data

The dataI Times seriesI Proportion of each subject in the training setI Positive rate for each subject in the training set


The AUC metric

The AUC metric

I Area under ROC curve

I Estimator (Wilcoxon-Man-Whitney) :P(f (x+) > f (x−))

I Adapted to skewed data

false positive rate0 0.2 0.4 0.6 0.8 1

tru

e p

os

itiv

e r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curve - subj. 3 - (test on 20% of train)

AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318


The AUC metric

The AUC metric

I Area under ROC curveI Estimator (Wilcoxon-

Man-Whitney) :P(f (x+) > f (x−))

I Adapted to skewed data

false positive rate0 0.2 0.4 0.6 0.8 1

tru

e p

os

itiv

e r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curve - subj. 3 - (test on 20% of train)

AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318


Classification algorithms we tried

AUC gradient descentI linear classifier :

ˆAUC(~ )β =1

PQ

P∑j=1

Q∑k=1

g(~β.(x+j − x−

k )) (1)

I softening (for differentiability)

R(~β) =1

PQ

P∑j=1

Q∑k=1

sigmoid(~β.(x+j − x−

k )) (2)

I computationally feasible estimator :

Rl(~β) =1Q

Q∑k=1

sigmoid(~β.(x+kmodP − x−

k )) (3)

I progressively increase norm of β =⇒ close to realAUC + no local maximum



MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class



MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each class

I Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class



MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−

I Compare probabilities of belonging to each class



MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class



Neural Networks

I Apply on raw data (and not on top of ScatNet) not tooverfit.

I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9

neurons, and 400 iterations.



Neural Networks


I Cross-validations in the hyper-parameters space.

I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9




Neural Networks


I Cross-validations in the hyper-parameters space.I Used as a black box.

I Best architecture : 4 hidden layers with 14,13,10,9neurons, and 400 iterations.



Neural Networks


I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9




Use subjects IDs

I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject

I ... lead to overfittingI and not useful for “ready trained” device



Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for

trainingI Possible improvements : ScatNet + AUCSVM

(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.



Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transforms

I NN : blind but goodI All algorithms : less than 15 minutes on a laptop for






Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but good

I All algorithms : less than 15 minutes on a laptop fortraining

I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?




Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933


training

I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?




Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933






sparse wavelet representations and classiﬁcationsparse wavelet representations and classiﬁcation...

Documents