deep learning basics lecture 6: convolutional nn · figure from deep learning, by goodfellow,...

36
Deep Learning Basics Lecture 6: Convolutional NN Princeton University COS 495 Instructor: Yingyu Liang

Upload: others

Post on 26-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Deep Learning Basics Lecture 6: Convolutional NN

Princeton University COS 495

Instructor: Yingyu Liang

Page 2: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Review: convolutional layers

Page 3: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Convolution: two dimensional case

a b c d

e f g h

i j k l

w x

y z

bw + cx + fy + gz

wa + bx + ey + fz

Kernel/filter

Feature map

Input

Page 4: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Convolutional layers

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

the same weight shared for all output nodes

𝑚 output nodes

𝑛 input nodes

𝑘 kernel size

Page 5: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Terminology

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Page 6: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Case study: LeNet-5

Page 7: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

Page 8: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

• Apply convolution on 2D images (MNIST) and use backpropagation

Page 9: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

• Proposed in “Gradient-based learning applied to document recognition” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

• Apply convolution on 2D images (MNIST) and use backpropagation

• Structure: 2 convolutional layers (with pooling) + 3 fully connected layers• Input size: 32x32x1

• Convolution kernel size: 5x5

• Pooling: 2x2

Page 10: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Page 11: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Page 12: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5, stride: 1x1, #filters: 6

Page 13: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

Page 14: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Filter: 5x5x6, stride: 1x1, #filters: 16

Page 15: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Pooling: 2x2, stride: 2

Page 16: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 400x120

Page 17: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet-5

Figure from Gradient-based learning applied to document recognition,by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 120x84

Weight matrix: 84x10

Page 18: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Software platforms for CNNUpdated in April 2016; checked more recent ones online

Page 19: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Platform: Marvin (marvin.is)

Page 20: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Platform: Marvin by

Page 21: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet in Marvin: convolutional layer

Page 22: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet in Marvin: pooling layer

Page 23: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet in Marvin: fully connected layer

Page 24: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Platform: Caffe (caffe.berkeleyvision.org)

Page 25: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

LeNet in Caffe

Page 26: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Platform: Tensorflow (tensorflow.org)

Page 27: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Platform: Tensorflow (tensorflow.org)

Page 28: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Platform: Tensorflow (tensorflow.org)

Page 29: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Others

• Theano – CPU/GPU symbolic expression compiler in python (from MILA lab at University of Montreal)

• Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in lua

• Lasagne - Lasagne is a lightweight library to build and train neural networks in Theano

• See: http://deeplearning.net/software_links/

Page 30: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Optimization: momentum

Page 31: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Basic algorithms

• Minimize the (regularized) empirical loss

𝐿𝑅 𝜃 =1

𝑛σ𝑡=1

𝑛 𝑙(𝜃, 𝑥𝑡 , 𝑦𝑡) + 𝑅(𝜃)

where the hypothesis is parametrized by 𝜃

• Gradient descent𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡𝛻𝐿𝑅 𝜃𝑡

Page 32: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Mini-batch stochastic gradient descent

• Instead of one data point, work with a small batch of 𝑏 points

(𝑥𝑡𝑏+1,𝑦𝑡𝑏+1),…, (𝑥𝑡𝑏+𝑏,𝑦𝑡𝑏+𝑏)

• Update rule

𝜃𝑡+1 = 𝜃𝑡 − 𝜂𝑡𝛻1

𝑏

1≤𝑖≤𝑏

𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖 + 𝑅(𝜃𝑡)

Page 33: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Momentum

• Drawback of SGD: can be slow when gradient is small

• Observation: when the gradient is consistent across consecutive steps, can take larger steps

• Metaphor: rolling marble ball on gentle slope

Page 34: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Momentum

Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Contour: loss functionPath: SGD with momentumArrow: stochastic gradient

Page 35: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Momentum

• work with a small batch of 𝑏 points

(𝑥𝑡𝑏+1,𝑦𝑡𝑏+1),…, (𝑥𝑡𝑏+𝑏,𝑦𝑡𝑏+𝑏)

• Keep a momentum variable 𝑣𝑡, and set a decay rate 𝛼

• Update rule

𝑣𝑡 = 𝛼𝑣𝑡−1 − 𝜂𝑡𝛻1

𝑏

1≤𝑖≤𝑏

𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖 + 𝑅(𝜃𝑡)

𝜃𝑡+1 = 𝜃𝑡 + 𝑣𝑡

Page 36: Deep Learning Basics Lecture 6: Convolutional NN · Figure from Deep Learning, by Goodfellow, Bengio, and Courville the same weight shared for all output nodes ... •Theano –CPU/GPU

Momentum

• Keep a momentum variable 𝑣𝑡, and set a decay rate 𝛼

• Update rule

𝑣𝑡 = 𝛼𝑣𝑡−1 − 𝜂𝑡𝛻1

𝑏

1≤𝑖≤𝑏

𝑙 𝜃𝑡 , 𝑥𝑡𝑏+𝑖 , 𝑦𝑡𝑏+𝑖 + 𝑅(𝜃𝑡)

𝜃𝑡+1 = 𝜃𝑡 + 𝑣𝑡

• Practical guide: 𝛼 is set to 0.5 until the initial learning stabilizes and then is increased to 0.9 or higher.