hands-on deep learning in python
TRANSCRIPT
Hands-on Deep Learning in Python
Imry KissosDeep Learning MeetupTLV August 2015
Outline
● Problem Definition● Training a DNN
● Improving the DNN● Open Source Packages● Summary
2
Problem Definition
3
Deep Convolution Network
1 http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
Tutorial
● Goal: Detect faciallandmarks on (normal)face images● Data set provided by Dr. Yoshua Bengio● Tutorial code available:https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4
Flow
5
Predict Points on Test Set
Train ModelGeneral
Train Model “Nose Tip”
Train Model “Mouth Corners”
Flow
6
Train ImagesTrain Points
Fit TrainedNet
Flow
7
Test Images
Predict Predicted Points
Python Deep Learning Framework
nolearn - Wrapper to Lasagne
Lasagne - Theano extension for Deep Learning
Theano - Define, optimize, and mathematical expressions
Efficient Cuda GPU for DNN
8
Low Level
High Level
HW Supports: GPU & CPUOS: Linux, OS X, Windows
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering3. Optimization4. Training the DNN
9
Training a Deep Neural Network
1. Data Analysisa. Exploration + Validationb. Pre-Processingc. Batch and Split
2. Architecture Engineering3. Optimization4. Training the DNN
10
Data Exploration + ValidationData:● 7K gray-scale images of detected faces● 96x96 pixels per image● 15 landmarks per image (?)
Data validation:● Some Landmarks are missing
11
1
Pre-Processing
12
Data Normalization
Shuffle train data
Batch-- t - train batch
- validation batch
- - test batch
⇐One Epoch’s data
13train/valid/test splits are constant
Train / Validation Split
14
Classification - Train/Validation preserve classes proportion
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering
a. Layers Definitionb. Layers Implementation
3. Optimization4. Training
15
Architecture
16
X Y
Conv Pool Dense Output
Layers Definition
17
Activation Function
18
1
ReLU
Dense Layer
19
Dropout
20
Dropout
21
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering3. Optimization
a. Back Propagationb. Objectivec. SGDd. Updatese. Convergence Tuning
4. Training the DNN 22
Back PropagationForward Path
23
Conv Dense
X Y
OutputPoints
Back PropagationForward Path
24
X Y
ConvOutputPointsDense
X Y
Training Points
Back PropagationBackward Path
25
X Y
Conv Dense
Back PropagationUpdate
26
Conv Dense
For All Layers:
Objective
27
S.G.D
28
Updates the network after each batch
Karpathy - “Babysitting”: weights/updates ~1e3
Optimization - Updates
29 Alec Radford
Adjusting Learning Rate & Momentum
30
Linear in epoch
Convergence Tuning
31
stops according to validation loss
returns best weights
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering3. Optimization4. Training the DNN
a. Fitb. Fine Tune Pre-Trainedc. Learning Curves
32
Fit
33
Loop over validation batchs
Forward
Loop over train batchs
Forward+BackProp
Fine Tune Pre-Trained
fgd
34
change output layer
load pre-trained weight
fine tune specialist
Learning CurvesLoop over 6 Nets:
35
Epochs
Learning Curves Analysis
36
Net 1
Net 2
OverfittingConvergence Jittering
EpochsEpochs
RM
SE
RM
SE
Part 1 Summary
Training a DNN:
37
Part 1 EndBreak
Part 2 Beyond Training
Outline
● Problem Definition● Motivation● Training a DNN● Improving the DNN● Open Source Packages● Summary
40
Beyond Training
1. Improving the DNNa. Analysis Capabilitiesb. Augmentationc. Forward - Backward Pathd. Monitor Layers’ Training
2. Open Source Packages3. Summary
41
Improving the DNN
Very tempting:● >1M images● >1M parameters● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
42
Analysis Capabilities
1. Theoretical explanationa. Eg. dropout and augmentation decrease overfit
2. Empirical claims about a phenomenaa. Eg. normalization improves convergence
3. Numerical understandinga. Eg. exploding / vanishing updates
43
Reduce Overfitting
Solution:Data Augmentation
44
Net 1
Net 2
OverfittingEpochs
Data Augmentation
Horizontal Flip Perturbation
45
1
Advanced Augmentation
http://benanne.github.io/2015/03/17/plankton.html 46
Convergence Challenges
47Need to monitor forward + backward path
EpochsEpochs
RM
SE
Data ErrorNormalization
Forward - Backward PathForward
Backward:
Gradient w.r.t parameters
48
Monitor Layers’ Training
nolearn - visualize.py
49
Monitor Layers’ Training
50
X. Glorot ,Y. Bengio, Understanding the difficulty of training deep feedforward neural networks:
“Monitoring activation and gradients across layers and training iterations is a powerful investigation tool”
Easy to monitor in Theano Framework
Weight Initialization matters (1)
51
Layer 1- Gradient are close to zero - vanishing gradients
Weight Initialization matters (2)
52
Network returns close to zero values for all inputs
Monitoring Activation
plateaus sometimes seen when training neural networks
53
For most epochs the network returns close to zero output for all inputs
Objective plateaus sometimes can be explained by saturation
Max of Weights of Conv1:
Max of Updates of Conv1:
54http://cs231n.github.io/neural-networks-3/#baby
Monitoring weights/update ratio3e-1
2e-1
1e-1
0
3e-3
2e-3
1e-3
0
Epoch
Epoch
Beyond Training
1. Improving the DNN2. Open Source Packages
a. Hardware and OSb. Python Frameworkc. Deep Learning Open Source Packagesd. Effort Estimation
3. Summary
55
Hardware and OS● Amazon Cloud GPU:
AWS Lasagne GPU Setup Spot ~ $0.0031 per GPU Instance Hour● IBM Cloud GPU:
http://www-03.ibm.com/systems/platformcomputing/products/symphony/gpuharvesting.html
● Your Linux machine GPU:pip install -r https://raw.githubusercontent.com/dnouri/kfkd-
tutorial/master/requirements.txt
● Window install
http://deeplearning.net/software/theano/install_windows.html#install-windows
56
Starting Tips● Sanity Checks:
○ DNN Architecture : “Overfit a tiny subset of data” Karpathy
○ Check Regularization ↗ Loss ↗
● Use pre-trained VGG as a base line
● Start with ~3 conv layer with ~16 filter each - quickly iterate
57
Python
● Rich eco-system● State-of-the-art● Easy to port from prototype to production
58Podcast : http://www.reversim.com/2015/10/277-scientific-python.html
Python Deep Learning Framework
59Keras ,pylearn2, OpenDeep, Lasagne - common base
Tips from Deep Learning PackagesTorch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format defining experiment’s configuration
60
Deep Learning Open Source Packages
61
Caffe for applications Torch and Theano for research on Deep Learning itselfhttp://fastml.com/torch-vs-theano/
Black BoxWhite Box
Open source progress rapidly→ impossible to predict industry’s standard
Disruptive Effort Estimation
Feature Eng Deep Learning
62Still requires algorithmic expertise
Summary
● Dove into Training a DNN● Presented Analysis Capabilities● Reviewed Open Source Packages
63
ReferencesHinton Coursera Neuronal Networkhttps://www.coursera.org/course/neuralnetsTechnion Deep Learning coursehttp://moodle.technion.ac.il/course/view.php?id=4128Oxford Deep Learning coursehttps://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53FuCS231n CNN for Visual Recognitionhttp://cs231n.github.io/Deep Learning Bookhttp://www.iro.umontreal.ca/~bengioy/dlbook/Montreal DL summer schoolhttp://videolectures.net/deeplearning2015_montreal/
64
Questions?
65
Deep Convolution Regression Network