parallelization and optimization of the neuromorphic ... · of the neuromorphic simulation code....

Parallelization and optimizationof the neuromorphic simulation code.Application on the MNIST problem

Raphaël Couturier, Michel Salomon

FEMTO-ST - DISC Department - AND Team

November 2 & 3, 2015 / BesançonDynamical Systems and Brain-inspired Information Processing Workshop

IntroductionBackground• Emergence of hardware RC implementation

Analogue electronic ; optoelectronic ; fully opticalLarger et al. - Photonic information processing beyond Turing : an optoelectronic

implementation of reservoir computing, Opt. Express 20, 3241-3249 (2012)

• Matlab simulation code• Study processing conditions• Tuning parameters• Pre and post-processing by computer

Motivation• Study the concept of Reservoir Computing• Design a faster simulation code• Apply it to new problems

FEMTO-ST Institute 2 / 16

Outline

1. Neuromorphic processing

2. Parallelization and optimization

3. Performances on the MNIST problem

4. Conclusion and perspectives

Delay Dynamics as a ReservoirSpatio-temporal viewpoint of a DDE (Larger et al. - Opt. Express 20:3 2012)

• δτ → temporal spacing; τD → time delay• f (x)→ nonlinear transformation; h(t)→ impulse response

Computer simulation with an Ikeda type NLDDE

τdxdt

(t) = −x(t) + β sin2[α x(t − τD) + ρuin(t − τD) + Φ0]

α→ feedback scaling;β → gain; ρ→ amplification; Φ0 → offsetFEMTO-ST Institute 4 / 16

Spoken Digits Recognition

Input (pre-processing)• Lyon ear model transformation of each speech sample→ 60 samples × 86 frequency channels

• Channels connection to reservoir (400 neurons)→ sparse and random

Reservoir transient responseTemporal series recorded for Read-Out processing

Spoken Digits Recognition

Output (post-processing)• Training of the Read-Out→ optimize W R matrix for the digits of the training set

• Regression problem for A×W R ≈ B

W Ropt =

(AT A− λI

)−1AT B

• A = concatenates reservoir transient response for each digit• B = concatenates target matrices

Testing• Dataset of 500 speech samples→ 5 female speakers• 20-fold cross-validation→ 20 × 25 test samples• Performance evaluation→Word Error Rate

Matlab Simulation Code

Main steps1. Pre-processing

• Input data formatting (1D vector ; sampling period→ δτ )• W I initialization (randomly ; normalization)

2. Concatenation of 1D vectors→ batch processing3. Nonlinear transient computation

• Numerical integration using a Runge-Kutta C routine• Computation of matrices A and B

4. Read-out training→ Moore-Penrose matrix inversion5. Testing of the solution (cross-validation)

Computation time12 min for 306 “neurons” on a quad-core i7 1,8 GHz (2013)

Parallelization Scheme

Guidelines• Reservoir response is independent, whatever the data→ computation of matrices A and B can be parallelized

• Different regression tests are also independentIn practice• Simulation code rewritten in C++• Eigen C++ library for linear algebra operations• InterProcess Communication→ Message Passing Interface

Performance on speech recognition problem• Similar classification accuracy→ same WER• Reduced computation time

We can study problems with huge Matlab computation time

Finding Optimal Parameters

What parameters can be optimized ?• Currently

• Pitch of the Read-Out• Amplitude parameters→ δ;β;φ0• Regression parameter→ λ

• Next• Number of nodes significantly improving the solution

(threshold)• Input data filter (convolution filter for images)

Potentially any parameter can be optimizedOptimization heuristics• Currently→ simulated annealing

(probabilistic global search controlled by a cooling schedule)

• Next→ other metaheuristics like evolutionary algorithms

Application on the MNIST problem

Task of handwritten digits recognitionNational Institute of Standards and Technology database• Training dataset→ american census bureau employees• Test dataset→ american high school students

Mixed-NIST database is widely used in machine learningMixing of both datasets and improved images

• Datasets• Training→ 60K samples• Test→ 10K samples

• Grayscale Images• Normalized to fit into a 20× 20 pixel

bounding box• Centered and anti-aliased

Performances of the parallel codeClassification error for 10K images• 1 reservoir of 2000 neurons→ Digit Error Rate: 7.14%• 1000 reservoirs of 2 neurons→ DER: 3.85%

Speedup

0 5 10 15 20 25 30 35

speed u

nb. of cores

1000 reservoirs 2 neuronsideal

1 reservoir 2000 neurons

Exploring ways to improve the results

Using the parallel NTC code• Many small reservoirs and one read out• Features extraction using a simple 3× 3 convolution filter• Best error without convolution : around 3%

Using the Oger toolbox• Increasing the dataset with transformed images→ 15× 15 pixel bounding box and rotated images

• Subsampling of the reservoir response• Committee of reservoirs• Lower errors with the complete reservoir response

• 1 reservoir of 1200 neurons→ 1.42%• Committee of 31 reservoirs of 1000 neurons→ 1.25%

Comparison with other approaches

Convolutional Neural Networks• Feedforward multilayer network for visual information• Different type of layers

• Convolutional layer→ features extraction• Pooling layer→ reduce variance

• Many parameters to trainMultilayer Reservoir Computing (Jalalvand et al. - CICSyN 2015)• Stacking of reservoirs→ the next “corrects” the previous one

• Same outputs• Trained one after the other

• 3-layer system• 16K neurons per reservoir• 528K trainable parameters→ 16K nodes × 11 readouts × 3 layers

Comparison with other approaches

Classification errorsApproach Error rate Reference

LeNet-1 (CNN) 1.7 LeCun et al. - 1998A reservoir of 1200 neurons 1.42 Schaetti et al. - 2015SVM with gaussian kernel 1.4Committee of 31 reservoirs 1.25 Schaetti et al. - 2015

3-layer reservoir 0.92 Jalalvand et al. - 2015CNN of 551 neurons 0.35 Ciresan al. - 2011

Committee of 7 CNNs 0.23 Ciresan et al. - 2012(221 neurons in each CNN)

Remarks• CNNs give the best results, but have a long training time• A reservoir of 1000 neurons is trained in 15 minutes• Automatic features extraction improves the results

Conclusion and perspectives

Results• A parallel code allowing fast simulations• A first evaluation on the MNIST problem

Future works• Further code improvement→ parallel regression• Use of several reservoirs

• Committees• Correct errors of a reservoir by another one

• Other applications• Simulation of lung motion• Airflow prediction• etc.

Thank you for your attention

Questions ?

parallelization and optimization of the neuromorphic ... · of the neuromorphic simulation code....

Documents

f09/f10 neuromorphic computing - physikalisches … ·...

automatic parallelization

neuromorphic circuits technical report

parallelization - cons.mit.edu

handwritten account

applications of neuromorphic computing

neuromorphic enginerring- subthrshold design

neuromorphic image processing

a mobile electrophysiology board for autonomous...

trend towards parallelization

neuromorphic fringe projection proﬁlometry

neuromorphic computing grollier.pdf

danna a neuromorphic computing vlsi...

photonic neuromorphic computing - nanohub

neuromorphic microchips

visual system an electro-photo-sensitive synaptic ...an...

automatic parallelization by pattern-matching · pdf...

efficient parallelization of a dynamic unstructured ... ·...

parallelization and tuning

neuromorphic computing based processors - isqed ·...