deep learning basics lecture 1: feedforward · pdf filedeep learning basics lecture 1:...
TRANSCRIPT
![Page 1: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/1.jpg)
Deep Learning Basics Lecture 1: Feedforward
Princeton University COS 495
Instructor: Yingyu Liang
![Page 2: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/2.jpg)
Motivation I: representation learning
![Page 3: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/3.jpg)
Machine learning 1-2-3
• Collect data and extract features
• Build model: choose hypothesis class 𝓗 and loss function 𝑙
• Optimization: minimize the empirical loss
![Page 4: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/4.jpg)
Features
Color Histogram
Red Green Blue
Extract features
𝑥
𝑦 = 𝑤𝑇𝜙 𝑥build hypothesis
![Page 5: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/5.jpg)
Features: part of the model
𝑦 = 𝑤𝑇𝜙 𝑥build hypothesis
Linear model
Nonlinear model
![Page 6: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/6.jpg)
Example: Polynomial kernel SVM
𝑥1
𝑥2
𝑦 = sign(𝑤𝑇𝜙(𝑥) + 𝑏)
Fixed 𝜙 𝑥
![Page 7: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/7.jpg)
Motivation: representation learning
• Why don’t we also learn 𝜙 𝑥 ?
Learn 𝜙 𝑥
𝑥
𝑦 = 𝑤𝑇𝜙 𝑥Learn 𝑤𝜙 𝑥
![Page 8: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/8.jpg)
Feedforward networks
• View each dimension of 𝜙 𝑥 as something to be learned
𝑥
𝜙 𝑥
𝑦 = 𝑤𝑇𝜙 𝑥
……
![Page 9: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/9.jpg)
Feedforward networks
• Linear functions 𝜙𝑖 𝑥 = 𝜃𝑖𝑇𝑥 don’t work: need some nonlinearity
𝑥
𝜙 𝑥
𝑦 = 𝑤𝑇𝜙 𝑥
……
![Page 10: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/10.jpg)
Feedforward networks
• Typically, set 𝜙𝑖 𝑥 = 𝑟(𝜃𝑖𝑇𝑥) where 𝑟(⋅) is some nonlinear function
𝑥
𝜙 𝑥
𝑦 = 𝑤𝑇𝜙 𝑥
……
![Page 11: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/11.jpg)
Feedforward deep networks
• What if we go deeper?
… …
…… …
…
𝑥
ℎ1 ℎ2 ℎ𝐿
𝑦
![Page 12: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/12.jpg)
Figure fromDeep learning, by Goodfellow, Bengio, Courville.Dark boxes are things to be learned.
![Page 13: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/13.jpg)
Motivation II: neurons
![Page 14: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/14.jpg)
Motivation: neurons
Figure from Wikipedia
![Page 15: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/15.jpg)
Motivation: abstract neuron model
• Neuron activated when the correlation
between the input and a pattern 𝜃
exceeds some threshold 𝑏
• 𝑦 = threshold(𝜃𝑇𝑥 − 𝑏)
or 𝑦 = 𝑟(𝜃𝑇𝑥 − 𝑏)
• 𝑟(⋅) called activation function 𝑦
𝑥1
𝑥2
𝑥𝑑
![Page 16: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/16.jpg)
Motivation: artificial neural networks
![Page 17: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/17.jpg)
Motivation: artificial neural networks
• Put into layers: feedforward deep networks
… …
…… …
…
𝑥
ℎ1 ℎ2 ℎ𝐿
𝑦
![Page 18: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/18.jpg)
Components in Feedforward networks
![Page 19: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/19.jpg)
Components
• Representations:• Input
• Hidden variables
• Layers/weights:• Hidden layers
• Output layer
![Page 20: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/20.jpg)
Components
… …
…… …
…
Hidden variables ℎ1 ℎ2 ℎ𝐿
𝑦
Input 𝑥
First layer Output layer
![Page 21: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/21.jpg)
Input
• Represented as a vector
• Sometimes require some
preprocessing, e.g.,• Subtract mean
• Normalize to [-1,1]
Expand
![Page 22: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/22.jpg)
Output layers
• Regression: 𝑦 = 𝑤𝑇ℎ + 𝑏
• Linear units: no nonlinearity
ℎ
𝑦
Output layer
![Page 23: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/23.jpg)
Output layers
• Multi-dimensional regression: 𝑦 = 𝑊𝑇ℎ + 𝑏
• Linear units: no nonlinearity
ℎ
𝑦
Output layer
![Page 24: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/24.jpg)
Output layers
• Binary classification: 𝑦 = 𝜎(𝑤𝑇ℎ + 𝑏)
• Corresponds to using logistic regression on ℎ
ℎ
𝑦
Output layer
![Page 25: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/25.jpg)
Output layers
• Multi-class classification:
• 𝑦 = softmax 𝑧 where 𝑧 = 𝑊𝑇ℎ + 𝑏
• Corresponds to using multi-class
logistic regression on ℎ
ℎ
𝑦
Output layer
𝑧
![Page 26: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/26.jpg)
Hidden layers
• Neuron take weighted linear combination of the previous layer
• So can think of outputting one value for the next layer
……
ℎ𝑖 ℎ𝑖+1
![Page 27: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/27.jpg)
Hidden layers
• 𝑦 = 𝑟(𝑤𝑇𝑥 + 𝑏)
• Typical activation function 𝑟• Threshold t 𝑧 = 𝕀[𝑧 ≥ 0]
• Sigmoid 𝜎 𝑧 = 1/ 1 + exp(−𝑧)
• Tanh tanh 𝑧 = 2𝜎 2𝑧 − 1
𝑦𝑥𝑟(⋅)
![Page 28: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/28.jpg)
Hidden layers
• Problem: saturation
𝑦𝑥𝑟(⋅)
Figure borrowed from Pattern Recognition and Machine Learning, Bishop
Too small gradient
![Page 29: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/29.jpg)
Hidden layers
• Activation function ReLU (rectified linear unit)• ReLU 𝑧 = max{𝑧, 0}
Figure from Deep learning, by Goodfellow, Bengio, Courville.
![Page 30: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/30.jpg)
Hidden layers
• Activation function ReLU (rectified linear unit)• ReLU 𝑧 = max{𝑧, 0}
Gradient 0
Gradient 1
![Page 31: Deep Learning Basics Lecture 1: feedforward · PDF fileDeep Learning Basics Lecture 1: Feedforward Princeton University COS 495 Instructor: ... Motivation: abstract neuron model •Neuron](https://reader034.vdocuments.us/reader034/viewer/2022051508/5aa7d3b77f8b9a294b8c94c0/html5/thumbnails/31.jpg)
Hidden layers
• Generalizations of ReLU gReLU 𝑧 = max 𝑧, 0 + 𝛼min{𝑧, 0}
• Leaky-ReLU 𝑧 = max{𝑧, 0} + 0.01min{𝑧, 0}
• Parametric-ReLU 𝑧 : 𝛼 learnable
𝑧
gReLU 𝑧