![Page 1: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/1.jpg)
Basics of DL
Prof. Leal-Taixé and Prof. Niessner 1
![Page 2: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/2.jpg)
What we assume you know• Linear Algebra & Programming!
• Basics from the Introduction to Deep Learning lecture
• PyTorch (can use TensorFlow…)
• You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data.
Prof. Leal-Taixé and Prof. Niessner 2
![Page 3: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/3.jpg)
What is a Neural Network?
Prof. Leal-Taixé and Prof. Niessner 3
![Page 4: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/4.jpg)
Neural Network• Linear score function 𝑓 = 𝑊𝑥
Prof. Leal-Taixé and Prof. Niessner 4
On CIFAR-10
On ImageNetCredit: Li/Karpathy/Johnson
![Page 5: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/5.jpg)
Neural Network• Linear score function 𝑓 = 𝑊𝑥
• Neural network is a nesting of ‘functions’– 2-layers: 𝑓 = 𝑊2max(0,𝑊1𝑥)
– 3-layers: 𝑓 = 𝑊3max(0,𝑊2max(0,𝑊1𝑥))
– 4-layers: 𝑓 = 𝑊4 tanh (W3, max(0,𝑊2max(0,𝑊1𝑥)))
– 5-layers: 𝑓 = 𝑊5𝜎(𝑊4 tanh(W3, max(0,𝑊2max(0,𝑊1𝑥))))
– … up to hundreds of layers
Prof. Leal-Taixé and Prof. Niessner 5
![Page 6: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/6.jpg)
Neural Network
Prof. Leal-Taixé and Prof. Niessner 6
2-layer network: 𝑓 = 𝑊2max(0,𝑊1𝑥)
𝑥ℎ𝑊1
128 × 128 = 16384 1000
𝑓𝑊2
10
1-layer network: 𝑓 = 𝑊𝑥
𝑥𝑊
128 × 128 = 16384
𝑓
10
![Page 7: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/7.jpg)
Neural Network
Prof. Leal-Taixé and Prof. Niessner 7
Credit: Li/Karpathy/Johnson
![Page 8: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/8.jpg)
Loss functions
Prof. Leal-Taixé and Prof. Niessner 8
![Page 9: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/9.jpg)
Neural networksWhat is the shape of this function?
9Prof. Leal-Taixé and Prof. Niessner
Loss (Softmax,
Hinge)
Prediction
![Page 10: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/10.jpg)
Loss functions• Softmax loss function
• Hinge Loss (derived from the Multiclass SVM loss)
10Prof. Leal-Taixé and Prof. Niessner
Evaluate the ground truth score for the image
![Page 11: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/11.jpg)
Loss functions• Softmax loss function
– Optimizes until the loss is zero
• Hinge Loss (derived from the Multiclass SVM loss)
– Saturates whenever it has learned a class “well enough”
11Prof. Leal-Taixé and Prof. Niessner
![Page 12: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/12.jpg)
Activation functions
Prof. Leal-Taixé and Prof. Niessner 12
![Page 13: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/13.jpg)
SigmoidForward
13Prof. Leal-Taixé and Prof. Niessner
Saturated neurons kill the gradient flow
![Page 14: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/14.jpg)
Problem of positive output
14Prof. Leal-Taixé and Prof. Niessner
More on zero-mean data later
![Page 15: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/15.jpg)
tanh
15Prof. Leal-Taixé and Prof. Niessner
Zero-centered
Still saturates
Still saturates
LeCun 1991
![Page 16: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/16.jpg)
Rectified Linear Units (ReLU)
16Prof. Leal-Taixé and Prof. Niessner
Large and consistent gradients
Does not saturateFast convergence
What happens if a ReLU outputs zero?
Dead ReLU
![Page 17: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/17.jpg)
Maxout units
17Prof. Leal-Taixé and Prof. Niessner
Generalization of ReLUs
Linear regimes
Does not die
Does not saturate
Increase of the number of parameters
![Page 18: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/18.jpg)
Optimization
Prof. Leal-Taixé and Prof. Niessner 18
![Page 19: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/19.jpg)
Gradient Descent for Neural Networks
Prof. Leal-Taixé and Prof. Niessner 19
𝑥0
𝑥1
𝑥2
ℎ0
ℎ1
ℎ2
ℎ3
𝑦0
𝑦1
𝑡0
𝑡1
𝑦𝑖 = 𝐴(𝑏1,𝑖 +
𝑗
ℎ𝑗𝑤1,𝑖,𝑗)ℎ𝑗 = 𝐴(𝑏0,𝑗 +
𝑘
𝑥𝑘𝑤0,𝑗,𝑘)
𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2
𝛻𝑤,𝑏𝑓𝑥,𝑡 (𝑤) =
𝜕𝑓
𝜕𝑤0,0,0……𝜕𝑓
𝜕𝑤𝑙,𝑚,𝑛……𝜕𝑓
𝜕𝑏𝑙,𝑚
Just simple: 𝐴 𝑥 = max(0, 𝑥)
![Page 20: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/20.jpg)
Stochastic Gradient Descent (SGD)𝜃𝑘+1 = 𝜃𝑘 − 𝛼𝛻𝜃𝐿(𝜃
𝑘 , 𝑥{1..𝑚}, 𝑦{1..𝑚})
𝛻𝜃𝐿 =1
𝑚σ𝑖=1𝑚 𝛻𝜃𝐿𝑖
Note the terminology: iteration vs epoch
Prof. Leal-Taixé and Prof. Niessner 20
𝑘 now refers to 𝑘-th iteration
𝑚 training samples in the current batch
Gradient for the 𝑘-th batch
![Page 21: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/21.jpg)
Gradient Descent with Momentum𝑣𝑘+1 = 𝛽 ⋅ 𝑣𝑘 + 𝛻𝜃𝐿(𝜃
𝑘)
𝜃𝑘+1 = 𝜃𝑘 − 𝛼 ⋅ 𝑣𝑘+1
Exponentially-weighted average of gradient
Important: velocity 𝑣𝑘 is vector-valued!
Prof. Leal-Taixé and Prof. Niessner 21
Gradient of current minibatchvelocity
accumulation rate(‘friction’, momentum)
learning ratevelocitymodel
![Page 22: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/22.jpg)
Gradient Descent with Momentum
𝜃𝑘+1 = 𝜃𝑘 − 𝛼 ⋅ 𝑣𝑘+1
Prof. Leal-Taixé and Prof. Niessner 22
Step will be largest when a sequence of gradients all point to the same direction
Fig. credit: I. Goodfellow
Hyperparameters are 𝛼, 𝛽𝛽 is often set to 0.9
![Page 23: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/23.jpg)
RMSProp
𝑠𝑘+1 = 𝛽 ⋅ 𝑠𝑘 + (1 − 𝛽)[𝛻𝜃𝐿 ∘ 𝛻𝜃𝐿]
𝜃𝑘+1 = 𝜃𝑘 − 𝛼 ⋅𝛻𝜃𝐿
𝑠𝑘+1 + 𝜖
Hyperparameters: 𝛼, 𝛽, 𝜖
Prof. Leal-Taixé and Prof. Niessner 23
Typically 10−8Often 0.9
Element-wise multiplication
Needs tuning!
![Page 24: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/24.jpg)
RMSProp
Prof. Leal-Taixé and Prof. Niessner 24
X-directionSmall gradients
Y-D
irec
tio
nLa
rge
grad
ien
ts
Fig. credit: A. Ng
𝑠𝑘+1 = 𝛽 ⋅ 𝑠𝑘 + (1 − 𝛽)[𝛻𝜃𝐿 ∘ 𝛻𝜃𝐿]
𝜃𝑘+1 = 𝜃𝑘 − 𝛼 ⋅𝛻𝜃𝐿
𝑠𝑘+1 + 𝜖We’re dividing by square gradients:- Division in Y-Direction will be large- Division in X-Direction will be small
(uncentered) variance of gradients -> second momentum
Can increase learning rate!
![Page 25: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/25.jpg)
Adaptive Moment Estimation (Adam)Combines Momentum and RMSProp
𝑚𝑘+1 = 𝛽1 ⋅ 𝑚𝑘 + 1 − 𝛽1 𝛻𝜃𝐿 𝜃𝑘
𝑣𝑘+1 = 𝛽2 ⋅ 𝑣𝑘 + (1 − 𝛽2)[𝛻𝜃𝐿 𝜃𝑘 ∘ 𝛻𝜃𝐿 𝜃𝑘 ]
𝜃𝑘+1 = 𝜃𝑘 − 𝛼 ⋅𝑚𝑘+1
𝑣𝑘+1+𝜖
Prof. Leal-Taixé and Prof. Niessner 25
First momentum: mean of gradients
Second momentum: variance of gradients
![Page 26: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/26.jpg)
AdamCombines Momentum and RMSProp
𝑚𝑘+1 = 𝛽1 ⋅ 𝑚𝑘 + 1 − 𝛽1 𝛻𝜃𝐿 𝜃𝑘
𝑣𝑘+1 = 𝛽2 ⋅ 𝑣𝑘 + (1 − 𝛽2)[𝛻𝜃𝐿 𝜃𝑘 ∘ 𝛻𝜃𝐿 𝜃𝑘 ]
𝜃𝑘+1 = 𝜃𝑘 − 𝛼 ⋅ෝ𝑚𝑘+1
ො𝑣𝑘+1+𝜖
Prof. Leal-Taixé and Prof. Niessner 26
𝑚𝑘+1 and 𝑣𝑘+1 are initialized with zero-> bias towards zero
Typically, bias-corrected moment updates
ෝ𝑚𝑘+1 =𝑚𝑘
1 − 𝛽1
ො𝑣𝑘+1 =𝑣𝑘
1 − 𝛽2
![Page 27: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/27.jpg)
Convergence
27
![Page 28: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/28.jpg)
Training NNs
Prof. Leal-Taixé and Prof. Niessner 28
![Page 29: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/29.jpg)
Importance of Learning Rate
Prof. Leal-Taixé and Prof. Niessner 29
![Page 30: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/30.jpg)
Over- and Underfitting
Prof. Leal-Taixé and Prof. Niessner 30
Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017
![Page 31: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/31.jpg)
Over- and Underfitting
Prof. Leal-Taixé and Prof. Niessner 31
Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html
![Page 32: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/32.jpg)
Basic recipe for machine learning• Split your data
Prof. Leal-Taixé and Prof. Niessner 32
Find your hyperparameters
20%
train testvalidation
20%60%
![Page 33: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/33.jpg)
Basic recipe for machine learning
Prof. Leal-Taixé and Prof. Niessner 33
![Page 34: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/34.jpg)
Regularization
Prof. Leal-Taixé and Prof. Niessner 34
![Page 35: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/35.jpg)
Regularization• Any strategy that aims to
35Prof. Leal-Taixé and Prof. Niessner
Lower validation error
Increasing training error
![Page 36: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/36.jpg)
Data augmentation
36Prof. Leal-Taixé and Prof. Niessner Krizhevsky 2012
![Page 37: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/37.jpg)
Early stoppingTraining time is also a hyperparameter
37Prof. Leal-Taixé and Prof. Niessner
Overfitting
![Page 38: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/38.jpg)
Bagging and ensemble methods • Bagging: uses k different datasets
38Prof. Leal-Taixé and Prof. Niessner
Training Set 1 Training Set 2 Training Set 3
![Page 39: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/39.jpg)
Dropout• Disable a random set of neurons (typically 50%)
39Prof. Leal-Taixé and Prof. Niessner Srivastava 2014
Fo
rward
![Page 40: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/40.jpg)
How to deal with images?
Prof. Leal-Taixé and Prof. Niessner 40
![Page 41: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/41.jpg)
Using CNNs in Computer Vision
Prof. Leal-Taixé and Prof. Niessner 41Credit: Li/Karpathy/Johnson
![Page 42: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/42.jpg)
Image filters• Each kernel gives us a different image filter
Prof. Leal-Taixé and Prof. Niessner 42
InputEdge detection−1 −1 −1−1 8 −1−1 −1 −1
Sharpen0 −1 0−1 5 −10 −1 0
Box mean1
9
1 1 11 1 11 1 1
Gaussian blur1
16
1 2 12 4 21 2 1
![Page 43: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/43.jpg)
Convolutions on RGB Images
Prof. Leal-Taixé and Prof. Niessner 43
32
32
3
35
5
32 × 32 × 3 image (pixels 𝑥)
5 × 5 × 3 filter (weights 𝑤)
1
28
28
activation map(also feature map)
Convolve
slide over all spatial locations 𝑥𝑖and compute all output 𝑧𝑖 ;w/o padding, there are 28 × 28 locations
![Page 44: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/44.jpg)
Convolution Layer
Prof. Leal-Taixé and Prof. Niessner 44
32
32
3
32 × 32 × 3 image
528
28
activation maps
Convolve
Let’s apply **five** filters,each with different weights!
Convolution “Layer”
![Page 45: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/45.jpg)
Prof. Leal-Taixé and Prof. Niessner 45
CNN PrototypeConvNet is concatenation of Conv Layers and activations
32
32
3
28
28
5
24
24
8
Conv +ReLU
Conv +ReLU
Conv +ReLU
12
5 filters5 × 5 × 3
8 filters5 × 5 × 5
12 filters5 × 5 × 8
Input Image
20
![Page 46: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/46.jpg)
CNN learned filters
Prof. Leal-Taixé and Prof. Niessner 46
![Page 47: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/47.jpg)
Prof. Leal-Taixé and Prof. Niessner 47
Pooling Layer: Max Pooling
3 1 3 5
6 0 7 9
3 2 1 4
0 2 4 3
6 9
3 4
Single depth slice of input
Max pool with2 × 2 filters and stride 2
‘Pooled’ output
![Page 48: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/48.jpg)
Classic CNN architectures
Prof. Leal-Taixé and Prof. Niessner 48
![Page 49: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/49.jpg)
LeNet• Digit recognition: 10 classes
• Conv -> Pool -> Conv -> Pool -> Conv -> FC• As we go deeper: Width, height Number of filters
Prof. Leal-Taixé and Prof. Niessner 49
60k parameters
![Page 50: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/50.jpg)
AlexNet
• Softmax for 1000 classes
Prof. Leal-Taixé and Prof. Niessner 50
[Krizhevsky et al. 2012]
![Page 51: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/51.jpg)
VGGNet• Striving for simplicity
• CONV = 3x3 filters with stride 1, same convolutions
• MAXPOOL = 2x2 filters with stride 2
Prof. Leal-Taixé and Prof. Niessner 51
[Simonyan and Zisserman 2014]
![Page 52: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/52.jpg)
VGGNet
Prof. Leal-Taixé and Prof. Niessner 52
Conv=3x3,s=1,sameMaxpool=2x2,s=2
Still very common: VGG-16
![Page 53: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/53.jpg)
ResNet
Prof. Leal-Taixé and Prof. Niessner 53
[He et al. 2015]
![Page 54: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/54.jpg)
ResNet
- Xavier/2 init by He et al.• Xavier/2 initialization• SGD + Momentum (0.9)• Learning rate 0.1, divided by 10 when plateau• Mini-batch size 256• Weight decay of 1e-5• No dropout
Prof. Leal-Taixé and Prof. Niessner 54
[He et al. 2015]
![Page 55: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/55.jpg)
ResNet• If we make the network deeper, at some point
performance starts to degrade
• Too many parameters, the optimizer cannot properly train the network
Prof. Leal-Taixé and Prof. Niessner 55
![Page 56: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/56.jpg)
ResNet• If we make the network deeper, at some point
performance starts to degrade
Prof. Leal-Taixé and Prof. Niessner 56
![Page 57: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/57.jpg)
Inception layer
Prof. Leal-Taixé and Prof. Niessner 57
![Page 58: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/58.jpg)
GoogLeNet: using the inception layer
Prof. Leal-Taixé and Prof. Niessner 58
[Szegedy et al. 2014]
Inception block
![Page 59: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/59.jpg)
CNN Architectures
Prof. Leal-Taixé and Prof. Niessner 59
![Page 60: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/60.jpg)
ADL4CV Content
Prof. Leal-Taixé and Prof. Niessner 64
![Page 61: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/61.jpg)
Rough Outline• Lecture 1: introduction• Lecture 2: advanced architectures (e.g. siamese, capsules, attention) • Lecture 3: advanced architectures con’t• Lecture 4: Visualization, t-sne, grad-cam (active heatmaps), deep dream,
excitation backprop• Lecture 5: Bayesian Deep Learning• Lecture 6: Autoencoders, VAE
Lecture 7: GANs 1: Generative models, GANs.• Lecture 8: GANs 2: Generative models, GANs• Lecture 9: CNN++ / Audio<->Visual - autoregressive, pixelcnn• Lecture 10: RNN -> NLP <-> Visual Q&A (focus on the cross domain: CNN
for image, RNN for text) / • Lecture 11: Multi-dimensional CNN, 3D DL, video DL: pooling vs fully-conv,
operations… Self-supervised / unsupervised learning• Lecture 12: Domain Adaptation / Transfer LearningProf. Leal-Taixé and Prof. Niessner 65
![Page 62: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/62.jpg)
ADL4CV• Use Moodle!
• Enjoy 6 vs 30-40ish relationship (make use of it to prepare yourself for research)
• Interactive Lectures!
Prof. Leal-Taixé and Prof. Niessner 66
![Page 63: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/63.jpg)
How to train your neural network?
Prof. Leal-Taixé and Prof. Niessner 67
![Page 64: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/64.jpg)
Is data loading correct?• Data output (target): overfit to single training sample
(needs to have 100% because it just memorizes input)– It’s irrespective of input !!!
• Data input: overfit to a handful (e.g., 4) training samples– It’s now conditioned on input data
• Save and re-load data from PyTorch / TensorFlow
Prof. Leal-Taixé and Prof. Niessner 68
![Page 65: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/65.jpg)
Network debugging• Move from overfitting to a hand-full of samples
– 5, 10, 100, 1000…– At some point, we should see generalization
• Apply common sense: can we overfit to the current number of samples?
• Always be aware of network parameter count!
Prof. Leal-Taixé and Prof. Niessner 69
![Page 66: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/66.jpg)
Check timings• How long does each iteration take?
– Get precise timings!!!– If an iteration takes > 500ms, things get dicey…
• Where is the bottleneck: data loading vs backprop?– Speed up data loading: smaller resolutions, compression, train
from SSD – e.g., network training is good idea– Speed up backprop:
• Estimate total timings: how long until you see some pattern? How long till convergence?
Prof. Leal-Taixé and Prof. Niessner 70
![Page 67: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/67.jpg)
Network Architecture• 100% mistake so far: “let’s use super big network and
train for two weeks and we see where we stand.” [because we desperately need those 2%...]
• Start with simplest network possible: rule of thumb divide #layers you started with by 5.
• Get debug cycles down – ideally, minutes!!!
Prof. Leal-Taixé and Prof. Niessner 71
![Page 68: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/68.jpg)
Debugging• Need train/val/test curves
– Evaluation needs to be consistent!– Numbers need to be comparable
• Only make one change at a time– “I’ve added 5 more layers and double the training size, and
now I also trained 5 days longer” – it’s better, but WHY?
Prof. Leal-Taixé and Prof. Niessner 72
![Page 69: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/69.jpg)
OverfittingONLY THINKG ABOUT THIS ONCE YOU’R TRAINING LOSS GOES DOWN AND YOU CAN OVERFIT!
Typically try this order:• Network too big – makes things also faster • More regularization; e.g., weight decay• Not enough data - makes things slower!• Dropout - makes things slower!• Guideline: make training harder -> generalize better
Prof. Leal-Taixé and Prof. Niessner 73
![Page 70: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/70.jpg)
Pushing the limits!PROCEED ONLY IF YOU GENERALIZE AND YOU ADDRESSED OVERFITTING ISSUES!
• Bigger network -> more capacity, more power - needs also more data!
• Better architecture -> ResNet is typically standard, but InceptionNet architectures perform often better (e.g., InceptionNet v4, XceptionNet, etc.)
• Schedules for learning rate decay• Class-based re-weighting (e.g., give under-represented classes
higher weight)• Hyperparameter tuning: e.g., grid search; apply common sense!
Prof. Leal-Taixé and Prof. Niessner 74
![Page 71: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/71.jpg)
Bad signs…• Train error doesn’t go down…• Validation error doesn’t go down… (ahhh we don’t learn)• Validation performs better than train… (trust me, this
scenario is very unlikely – unless you have a bug )• Test on train set is different error than train… (forgot
dropout?)• Often people mess up the last batch in an epoch…
• You are training set contains test data…• You debug your algorithm on test data…
Prof. Leal-Taixé and Prof. Niessner 75
![Page 72: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/72.jpg)
“Most common” neural net mistakes1) you didn't try to overfit a single batch first. 2) you forgot to toggle train/eval mode for the net. 3) you forgot to .zero_grad() (in pytorch) before
.backward(). 4) you passed softmaxed outputs to a loss that expects
raw logits. 5) you didn't use bias=False for your Linear/Conv2d
layer when using BatchNorm, or conversely forget to include it for the output layer
Prof. Leal-Taixé and Prof. Niessner 76Credit: A. Karpathy
![Page 73: Basics of DL•PyTorch (can use TensorFlow… •You have trained already several models and know how to debug problems, observe training curves, prepare training/validation/test data](https://reader036.vdocuments.us/reader036/viewer/2022062606/5fe6f72049d8616b751f1f0d/html5/thumbnails/73.jpg)
Next lecture
• Next Monday: advanced architectures– Lecture always from 10am to 11:30am
• Keep projects in mind!– Start actively discussing -> reach out to us if you have
questions!
Prof. Leal-Taixé and Prof. Niessner 77