parallelization and optimization of the neuromorphic ... · of the neuromorphic simulation code....
Post on 22-Jun-2018
229 Views
Preview:
TRANSCRIPT
Parallelization and optimizationof the neuromorphic simulation code.Application on the MNIST problem
Raphaël Couturier, Michel Salomon
FEMTO-ST - DISC Department - AND Team
November 2 & 3, 2015 / BesançonDynamical Systems and Brain-inspired Information Processing Workshop
IntroductionBackground• Emergence of hardware RC implementation
Analogue electronic ; optoelectronic ; fully opticalLarger et al. - Photonic information processing beyond Turing : an optoelectronic
implementation of reservoir computing, Opt. Express 20, 3241-3249 (2012)
• Matlab simulation code• Study processing conditions• Tuning parameters• Pre and post-processing by computer
Motivation• Study the concept of Reservoir Computing• Design a faster simulation code• Apply it to new problems
FEMTO-ST Institute 2 / 16
Outline
1. Neuromorphic processing
2. Parallelization and optimization
3. Performances on the MNIST problem
4. Conclusion and perspectives
FEMTO-ST Institute 3 / 16
Delay Dynamics as a ReservoirSpatio-temporal viewpoint of a DDE (Larger et al. - Opt. Express 20:3 2012)
• δτ → temporal spacing; τD → time delay• f (x)→ nonlinear transformation; h(t)→ impulse response
Computer simulation with an Ikeda type NLDDE
τdxdt
(t) = −x(t) + β sin2[α x(t − τD) + ρuin(t − τD) + Φ0]
α→ feedback scaling;β → gain; ρ→ amplification; Φ0 → offsetFEMTO-ST Institute 4 / 16
Spoken Digits Recognition
Input (pre-processing)• Lyon ear model transformation of each speech sample→ 60 samples × 86 frequency channels
• Channels connection to reservoir (400 neurons)→ sparse and random
Reservoir transient responseTemporal series recorded for Read-Out processing
FEMTO-ST Institute 5 / 16
Spoken Digits Recognition
Output (post-processing)• Training of the Read-Out→ optimize W R matrix for the digits of the training set
• Regression problem for A×W R ≈ B
W Ropt =
(AT A− λI
)−1AT B
• A = concatenates reservoir transient response for each digit• B = concatenates target matrices
Testing• Dataset of 500 speech samples→ 5 female speakers• 20-fold cross-validation→ 20 × 25 test samples• Performance evaluation→Word Error Rate
FEMTO-ST Institute 6 / 16
Matlab Simulation Code
Main steps1. Pre-processing
• Input data formatting (1D vector ; sampling period→ δτ )• W I initialization (randomly ; normalization)
2. Concatenation of 1D vectors→ batch processing3. Nonlinear transient computation
• Numerical integration using a Runge-Kutta C routine• Computation of matrices A and B
4. Read-out training→ Moore-Penrose matrix inversion5. Testing of the solution (cross-validation)
Computation time12 min for 306 “neurons” on a quad-core i7 1,8 GHz (2013)
FEMTO-ST Institute 7 / 16
Parallelization Scheme
Guidelines• Reservoir response is independent, whatever the data→ computation of matrices A and B can be parallelized
• Different regression tests are also independentIn practice• Simulation code rewritten in C++• Eigen C++ library for linear algebra operations• InterProcess Communication→ Message Passing Interface
Performance on speech recognition problem• Similar classification accuracy→ same WER• Reduced computation time
We can study problems with huge Matlab computation time
FEMTO-ST Institute 8 / 16
Finding Optimal Parameters
What parameters can be optimized ?• Currently
• Pitch of the Read-Out• Amplitude parameters→ δ;β;φ0• Regression parameter→ λ
• Next• Number of nodes significantly improving the solution
(threshold)• Input data filter (convolution filter for images)
Potentially any parameter can be optimizedOptimization heuristics• Currently→ simulated annealing
(probabilistic global search controlled by a cooling schedule)
• Next→ other metaheuristics like evolutionary algorithms
FEMTO-ST Institute 9 / 16
Application on the MNIST problem
Task of handwritten digits recognitionNational Institute of Standards and Technology database• Training dataset→ american census bureau employees• Test dataset→ american high school students
Mixed-NIST database is widely used in machine learningMixing of both datasets and improved images
• Datasets• Training→ 60K samples• Test→ 10K samples
• Grayscale Images• Normalized to fit into a 20× 20 pixel
bounding box• Centered and anti-aliased
FEMTO-ST Institute 10 / 16
Performances of the parallel codeClassification error for 10K images• 1 reservoir of 2000 neurons→ Digit Error Rate: 7.14%• 1000 reservoirs of 2 neurons→ DER: 3.85%
Speedup
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35
speed u
p
nb. of cores
1000 reservoirs 2 neuronsideal
1 reservoir 2000 neurons
FEMTO-ST Institute 11 / 16
Exploring ways to improve the results
Using the parallel NTC code• Many small reservoirs and one read out• Features extraction using a simple 3× 3 convolution filter• Best error without convolution : around 3%
Using the Oger toolbox• Increasing the dataset with transformed images→ 15× 15 pixel bounding box and rotated images
• Subsampling of the reservoir response• Committee of reservoirs• Lower errors with the complete reservoir response
• 1 reservoir of 1200 neurons→ 1.42%• Committee of 31 reservoirs of 1000 neurons→ 1.25%
FEMTO-ST Institute 12 / 16
Comparison with other approaches
Convolutional Neural Networks• Feedforward multilayer network for visual information• Different type of layers
• Convolutional layer→ features extraction• Pooling layer→ reduce variance
• Many parameters to trainMultilayer Reservoir Computing (Jalalvand et al. - CICSyN 2015)• Stacking of reservoirs→ the next “corrects” the previous one
• Same outputs• Trained one after the other
• 3-layer system• 16K neurons per reservoir• 528K trainable parameters→ 16K nodes × 11 readouts × 3 layers
FEMTO-ST Institute 13 / 16
Comparison with other approaches
Classification errorsApproach Error rate Reference
LeNet-1 (CNN) 1.7 LeCun et al. - 1998A reservoir of 1200 neurons 1.42 Schaetti et al. - 2015SVM with gaussian kernel 1.4Committee of 31 reservoirs 1.25 Schaetti et al. - 2015
3-layer reservoir 0.92 Jalalvand et al. - 2015CNN of 551 neurons 0.35 Ciresan al. - 2011
Committee of 7 CNNs 0.23 Ciresan et al. - 2012(221 neurons in each CNN)
Remarks• CNNs give the best results, but have a long training time• A reservoir of 1000 neurons is trained in 15 minutes• Automatic features extraction improves the results
FEMTO-ST Institute 14 / 16
Conclusion and perspectives
Results• A parallel code allowing fast simulations• A first evaluation on the MNIST problem
Future works• Further code improvement→ parallel regression• Use of several reservoirs
• Committees• Correct errors of a reservoir by another one
• Other applications• Simulation of lung motion• Airflow prediction• etc.
FEMTO-ST Institute 15 / 16
top related