sparse wavelet representations and classificationsparse wavelet representations and classification...

33
Sparse Wavelet Representations and Classification Challenge Kaggle in class D REEM March 5th 2015 MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 1/10

Upload: others

Post on 12-Mar-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Sparse Wavelet Representations andClassification

Challenge Kaggle in class

DREEM

March 5th 2015

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 1/10

Page 2: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Introduction

Introduction

I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.

I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.

I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.

I Skewed data set: only 7% of positive exs.I AUC metric.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10

Page 3: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Introduction

Introduction

I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.

I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.

I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.

I Skewed data set: only 7% of positive exs.I AUC metric.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10

Page 4: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Introduction

Introduction

I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.

I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.

I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.

I Skewed data set: only 7% of positive exs.I AUC metric.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10

Page 5: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Introduction

Introduction

I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.

I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.

I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.

I Skewed data set: only 7% of positive exs.

I AUC metric.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10

Page 6: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Introduction

Introduction

I Context: challenge kaggle in class proposed by thestart-up Dreem. Binary classification problem.

I Data: EEG signals, 3 sec sampled at 200 Hz on 20subjects.

I Labels: 1 if a Slow Oscillation occurs during the next0.5 sec, 0 else.

I Skewed data set: only 7% of positive exs.I AUC metric.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 2/10

Page 7: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Introduction

The data

The AUC metric

Classification algorithms we triedScatNet + AUC gradient descent (Herschtal, Raskutti,ICML 2004ScatNet + Mixture of Probabilistic PrincipalComponent Analysis (MPPCA) (Tipping, Bishop,1999)Neural NetworksIdea (cheat ?) use subjects IDs

Results and Conclusion

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 3/10

Page 8: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

The data

The data

I Times series

I Proportion of each subject in the training setI Positive rate for each subject in the training set

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10

Page 9: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

The data

The dataI Times seriesI Proportion of each subject in the training set

I Positive rate for each subject in the training set

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10

Page 10: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

The data

The dataI Times seriesI Proportion of each subject in the training setI Positive rate for each subject in the training set

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 4/10

Page 11: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

The AUC metric

The AUC metric

I Area under ROC curve

I Estimator (Wilcoxon-Man-Whitney) :P(f (x+) > f (x−))

I Adapted to skewed data

false positive rate0 0.2 0.4 0.6 0.8 1

tru

e p

os

itiv

e r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curve - subj. 3 - (test on 20% of train)

AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10

Page 12: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

The AUC metric

The AUC metric

I Area under ROC curveI Estimator (Wilcoxon-

Man-Whitney) :P(f (x+) > f (x−))

I Adapted to skewed data

false positive rate0 0.2 0.4 0.6 0.8 1

tru

e p

os

itiv

e r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curve - subj. 3 - (test on 20% of train)

AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10

Page 13: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

The AUC metric

The AUC metric

I Area under ROC curveI Estimator (Wilcoxon-

Man-Whitney) :P(f (x+) > f (x−))

I Adapted to skewed data

false positive rate0 0.2 0.4 0.6 0.8 1

tru

e p

os

itiv

e r

ate

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1ROC curve - subj. 3 - (test on 20% of train)

AUC GD - 0.8854MPPCA - 0.8480NN - 0.8318

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 5/10

Page 14: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

AUC gradient descentI linear classifier :

ˆAUC(~ )β =1

PQ

P∑j=1

Q∑k=1

g(~β.(x+j − x−

k )) (1)

I softening (for differentiability)

R(~β) =1

PQ

P∑j=1

Q∑k=1

sigmoid(~β.(x+j − x−

k )) (2)

I computationally feasible estimator :

Rl(~β) =1Q

Q∑k=1

sigmoid(~β.(x+kmodP − x−

k )) (3)

I progressively increase norm of β =⇒ close to realAUC + no local maximum

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10

Page 15: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

AUC gradient descentI linear classifier :

ˆAUC(~ )β =1

PQ

P∑j=1

Q∑k=1

g(~β.(x+j − x−

k )) (1)

I softening (for differentiability)

R(~β) =1

PQ

P∑j=1

Q∑k=1

sigmoid(~β.(x+j − x−

k )) (2)

I computationally feasible estimator :

Rl(~β) =1Q

Q∑k=1

sigmoid(~β.(x+kmodP − x−

k )) (3)

I progressively increase norm of β =⇒ close to realAUC + no local maximum

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10

Page 16: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

AUC gradient descentI linear classifier :

ˆAUC(~ )β =1

PQ

P∑j=1

Q∑k=1

g(~β.(x+j − x−

k )) (1)

I softening (for differentiability)

R(~β) =1

PQ

P∑j=1

Q∑k=1

sigmoid(~β.(x+j − x−

k )) (2)

I computationally feasible estimator :

Rl(~β) =1Q

Q∑k=1

sigmoid(~β.(x+kmodP − x−

k )) (3)

I progressively increase norm of β =⇒ close to realAUC + no local maximum

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10

Page 17: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

AUC gradient descentI linear classifier :

ˆAUC(~ )β =1

PQ

P∑j=1

Q∑k=1

g(~β.(x+j − x−

k )) (1)

I softening (for differentiability)

R(~β) =1

PQ

P∑j=1

Q∑k=1

sigmoid(~β.(x+j − x−

k )) (2)

I computationally feasible estimator :

Rl(~β) =1Q

Q∑k=1

sigmoid(~β.(x+kmodP − x−

k )) (3)

I progressively increase norm of β =⇒ close to realAUC + no local maximum

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 6/10

Page 18: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10

Page 19: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each class

I Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10

Page 20: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−

I Compare probabilities of belonging to each class

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10

Page 21: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

MPPCA

I Generative model

t |x ∼ N (Wx + µ;σ2Id) , x ∼ N (0; Iq)

I Idea : train a model for each classI Probabilistic model⇒ independent of ratio +/−I Compare probabilities of belonging to each class

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 7/10

Page 22: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

Neural Networks

I Apply on raw data (and not on top of ScatNet) not tooverfit.

I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9

neurons, and 400 iterations.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10

Page 23: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

Neural Networks

I Apply on raw data (and not on top of ScatNet) not tooverfit.

I Cross-validations in the hyper-parameters space.

I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9

neurons, and 400 iterations.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10

Page 24: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

Neural Networks

I Apply on raw data (and not on top of ScatNet) not tooverfit.

I Cross-validations in the hyper-parameters space.I Used as a black box.

I Best architecture : 4 hidden layers with 14,13,10,9neurons, and 400 iterations.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10

Page 25: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

Neural Networks

I Apply on raw data (and not on top of ScatNet) not tooverfit.

I Cross-validations in the hyper-parameters space.I Used as a black box.I Best architecture : 4 hidden layers with 14,13,10,9

neurons, and 400 iterations.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 8/10

Page 26: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

Use subjects IDs

I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject

I ... lead to overfittingI and not useful for “ready trained” device

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 9/10

Page 27: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Classification algorithms we tried

Use subjects IDs

I Idea 1 : different behaviors among subjects⇒ train 1classifier/subject

I ... lead to overfittingI and not useful for “ready trained” device

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 9/10

Page 28: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Results and Conclusion

Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for

trainingI Possible improvements : ScatNet + AUCSVM

(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10

Page 29: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Results and Conclusion

Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transforms

I NN : blind but goodI All algorithms : less than 15 minutes on a laptop for

trainingI Possible improvements : ScatNet + AUCSVM

(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10

Page 30: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Results and Conclusion

Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but good

I All algorithms : less than 15 minutes on a laptop fortraining

I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10

Page 31: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Results and Conclusion

Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for

training

I Possible improvements : ScatNet + AUCSVM(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10

Page 32: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Results and Conclusion

Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for

trainingI Possible improvements : ScatNet + AUCSVM

(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10

Page 33: Sparse Wavelet Representations and ClassificationSparse Wavelet Representations and Classification Challenge Kaggle in class DREEM March 5th 2015 MVA j Sparse Wavelet Representations

Results and Conclusion

Conclusion

MPPCA NN AUC GD0.81578 0.84976 0.86933

I Best score : linear classifier on scattering transformsI NN : blind but goodI All algorithms : less than 15 minutes on a laptop for

trainingI Possible improvements : ScatNet + AUCSVM

(Brefeld, Scheffer) ? ScatNet+SVM-RBFcross-validated ?

I Reinforcement learning to adapt the algorithm to thesubject using the device.

MVA | Sparse Wavelet Representations and Classification | A. RECANATI, S. BUSSY 10/10