convolutional neural networks and supervised learningconvolutional neural networks and supervised...
TRANSCRIPT
![Page 1: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/1.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Convolutional Neural Networks and SupervisedLearning
Eilif Solberg
August 30, 2018
![Page 2: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/2.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Outline
Convolutional ArchitecturesConvolutional neural networks
TrainingLossOptimizationRegularizationHyperparameter search
Achitecture searchNAS1NAS2
Bibliography
![Page 3: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/3.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Convolutional Architectures
![Page 4: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/4.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Template matching
Figure: Illustration fromhttp://pixuate.com/technology/template-matching/
1. Try to match template at each location by �sliding overwindow�
2. Threshold for detection
For 2D-objects, kind of possible but di�cult
![Page 5: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/5.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Convolution
Which �lter has produces the activation map on the right?
![Page 6: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/6.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Convolutional layer
�> Glori�ed template matching
� Many templates (aka output �lters)
� We learn the templates, the weights are the templates
� Intermediate detection results only means to an end
� treat them as features, which we again match new templates to
� Starting from the second layer we have �nonlinear �lters�
![Page 7: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/7.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Hyperparameters of convolutional layer
1. Kernel height and width -template sizes
2. Stride - skips between templatematches
3. Dilation rate� �Wholes� in template where
we don't care� Larger �eld-of-view without
more weights. . .
4. Number of output �lters -number of templates
5. Padding - expand image,typically with zeros
Figure: Image fromhttp://neuralnetworksanddeeplearning.com/
![Page 8: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/8.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Detector / activation function
� Non-saturating activation functions as ReLU, leaky ReLUdominating
Figure: Sigmoidfunction
Figure: Tanh functionFigure: ReLU function
![Page 9: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/9.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Basic CNN architecture for image classi�cation
Image �> [Conv �> ReLU]xN �> Fully Connected �> Softmax
� Increase �lter depth when using stride
Improve with:
� Batch normalization
� Skip connections ala ResNet or DenseNet
� No fully connected, average pool predictions instead
![Page 10: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/10.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Training
![Page 11: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/11.jpg)
Convolutional Architectures Training Achitecture search Bibliography
How do we �t model?
How do we �nd parameters θ for our network?
![Page 12: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/12.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Supervised learning
� Training data comes as (X ,Y ) pairs, where Y is the target
� Want to learn f (x) ∼ p(y |x), conditional distribution of Ygiven X
� De�ne di�erentiable surrogate loss function, e.g. for a singlesample
l(f (X ),Y ) = (f (X )− Y )2regression (1)
l(f (X ),Y ) = −∑c
Yc log(f (X )c)classi�cation (2)
![Page 13: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/13.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Gradient
� The direction for which the function increases the most
Figure: Gradient of the function f (x2, y2) = x/ex2+y
2
[By Vivekj78 [CC BY-SA3.0 (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons]
![Page 14: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/14.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Backpropagation
� E�cient bookkeeping scheme when applying chain rule fordi�erentiation
� Biologically implausible?
![Page 15: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/15.jpg)
Convolutional Architectures Training Achitecture search Bibliography
(Stochastic) gradient descentTaking steps in the opposite direction of the gradient
Figure: [By Vivekj78 [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], fromWikimedia Commons]
� Full gradient too expensive / not necessary
N∑i=1
∇θl(f (Xi ),Yi ) ≈n∑i=1
∇θl(f (XP(i)),YP(i)) (3)
for a random permutation P .
Many di�erent extensions to standard SGD� SGD with momentum, RMSprop, ADAM.
![Page 16: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/16.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Network, loss, optimization
� Weight penalty added to loss term, usually squared L2normalization uniformly for all parameters
l(θ) + λ‖θ‖22 (4)
� Dropout
� Batch normalization� Intersection of optimization and generalization� Your best friend and your worst enemy
![Page 17: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/17.jpg)
Convolutional Architectures Training Achitecture search Bibliography
More on batch normalization
For a tensor [batch_size Ö height Ö width Ö depth], normalize�template matching scores� for each template d by
µd ←1
N ∗ H ∗W
N∑i=1
H∑h=1
W∑w=1
xi ,h,w ,d (5)
σ2d← 1
N ∗ H ∗W
N∑i=1
H∑h=1
W∑w=1
(xi ,h,w ,d − µd )2 (6)
x̂i ,h,w ,d ←xi ,h,w ,d − µd√
(σ2d+ ε)
(7)
yi ,h,w ,d ← γx̂i ,h,w ,d + β (8)
where N, H and W represents batch size, height and width.
� �Template/Feature more present than usual or not�
� During inference we use stored values for µd and σd .
![Page 18: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/18.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Data augmentation
Idea: apply random transformation to X that does not alter Y .
� Normally you would like result X ′ to be plausible, i.e. couldhave been a sample from the distribution of interest
� Which transformation you may use is application-dependent.
Image data
� Horizontal mirroring (issuefor objects not left/rightsymmetric)
� Random crop
� Scale
� Aspect ratio
� Lightning etc.
Text data
� Synonym insertion
� Back-translation: translateand translate back with e.g.Google Translate!!!
![Page 19: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/19.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Hyperparameters to search
� Learning rate (and learning rate schedule)
� Regularization params: L2, (dropout)
![Page 20: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/20.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Search strategies
� random search rather than grid search
� logscale when appropriate
� careful with best values on border
� may re�ne search
![Page 21: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/21.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Achitecture search
![Page 22: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/22.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Architecture search
1. De�ne the search space.
2. Decide upon the optimization algorithm� random search, reinforcment learning, genetic algorithms
![Page 23: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/23.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Neural architecture search
Figure: An overview of Neural Architecture Search. Figure and captionfrom [?].
![Page 24: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/24.jpg)
Convolutional Architectures Training Achitecture search Bibliography
NAS1 - search space
Fixed structure:
� Architecture is a series of layers of the form
conv2D(FH, FW, N) −→ batch-norm −→ ReLU
Degrees of freedom:
� Parameters of conv layer� �lter height, �lter width and number of output �lters
� Input layers to each conv layer
![Page 25: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/25.jpg)
Convolutional Architectures Training Achitecture search Bibliography
NAS1 - discovered architecture
Figure: FH is �lter height, FW is �lter width and N is number of �lters.If one layer has many input layers then all input layers are concatenatedin the depth dimension. Figure from [?].
![Page 26: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/26.jpg)
Convolutional Architectures Training Achitecture search Bibliography
NAS2 - search space
Fixed structure:
Figure: Architecure for CIFAR-10 and ImageNet. Figure from [?].
Degrees of freedom:
� Some freedom in normal cell and reduction cell, shall see soon
![Page 27: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/27.jpg)
Convolutional Architectures Training Achitecture search Bibliography
NAS2 - discovered convolutional cells
Normal Cell Reduction Cell
hi
hi-1
...
hi+1
concat
avg!3x3
sep!5x5
sep!7x7
sep!5x5
max!3x3
sep!7x7
add add
add add add
sep!3x3
iden!tity
avg!3x3
max!3x3
hi
hi-1
...
hi+1
concat
sep!3x3
avg!3x3
avg!3x3
sep!5x5
sep!3x3
iden!tity
iden!tity
sep!3x3
sep!5x5
avg!3x3
add add add addadd
Figure: NASNet-A identi�ed with CIFAR-10. Figure and caption from[?].
![Page 28: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/28.jpg)
Convolutional Architectures Training Achitecture search Bibliography
NAS2 - Performance(computational_cost)
10000 20000 300000
75
70
65
80
85
# Mult-Add operations (millions)
accu
racy
(pre
cisi
on @
1)
40000
PolyNet
Inception-v1
VGG-16
MobileNet
Inception-v3
Inception-v2
ResNeXt-101
ResNet-152Inception-v4
Inception-ResNet-v2
Xception
NASNet-A (6 @ 4032)
ShuffleNet
DPN-131NASNet-A (7 @ 1920)
NASNet-A (5 @ 1538)
NASNet-A (4 @ 1056)
SENet
Figure: Performance on ILSVRC12 as a function of number of�oating-point multiply-add operations needed to process an image.Figure from [?].
![Page 29: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/29.jpg)
Convolutional Architectures Training Achitecture search Bibliography
NAS2 - Performance(#parameters)
75
70
65
80
85
# parameters (millions)
accu
racy
(pre
cisi
on @
1)
60 80 100 120 1400 4020
NASNet-A (5 @ 1538)
NASNet-A (4 @ 1056)VGG-16
PolyNet
MobileNetInception-v1
ResNeXt-101
Inception-v2
Inception-v4
Inception-ResNet-v2
ResNet-152
Xception
Inception-v3
ShuffleNet
DPN-131
NASNet-A (6 @ 4032)
NASNet-A (7 @ 1920) SENet
Figure: Performance on ILSVRC12 as a function of number ofparameters. Figure from [?].
![Page 30: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/30.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Bibliography
![Page 31: Convolutional Neural Networks and Supervised LearningConvolutional Neural Networks and Supervised Learning Eilif Solberg August 30, 2018 ... rainingT Loss Optimization Regularization](https://reader036.vdocuments.us/reader036/viewer/2022062919/5edf2b3cad6a402d666a84c1/html5/thumbnails/31.jpg)
Convolutional Architectures Training Achitecture search Bibliography
Bibliography I