![Page 1: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/1.jpg)
Large-Scale Visual RecognitionWith Deep Learning
Sunday 23 June 2013
Marc'Aurelio Ranzato
[email protected]/~ranzato
![Page 2: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/2.jpg)
2
Why Is Recognition Hard?
ObjectRecognizer panda
Ranzato
![Page 3: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/3.jpg)
3
Why Is Recognition Hard?
ObjectRecognizer panda
Pose
Ranzato
![Page 4: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/4.jpg)
4
Why Is Recognition Hard?
ObjectRecognizer panda
Occlusion
Ranzato
![Page 5: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/5.jpg)
5
Why Is Recognition Hard?
ObjectRecognizer panda
Multiple objects
Ranzato
![Page 6: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/6.jpg)
6
Why Is Recognition Hard?
ObjectRecognizer panda
Inter-classsimilarity
Ranzato
![Page 7: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/7.jpg)
7
Ideal Features
Ideal Feature Extractor
- window, top-left- clock, top-middle- shelf, left- drawing,middle- statue, bottom left- …
- hat, bottom right
Q.: What objects are in the image? Where is the clock? What is on the top of the table? ... Ranzato
![Page 8: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/8.jpg)
8
Ideal Features Are Non-Linear
Ideal Feature Extractor
Ideal Feature Extractor
- club, angle = 90- man, frontal pose...
- club, angle = 360- man, side pose...
Ideal Feature Extractor
- club, angle = 270- man, frontal pose...
?
I 1
I 2Ranzato
![Page 9: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/9.jpg)
9
Ideal Features Are Non-Linear
Ideal Feature Extractor
Ideal Feature Extractor
- club, angle = 90- man, frontal pose...
- club, angle = 360- man, side pose...
Ideal Feature Extractor
- club, angle = 270- man, frontal pose...
I 1
I 2Ranzato
INPUT IS NOT THE
AVERAGE!
![Page 10: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/10.jpg)
10
Ideal Features Are Non-Linear
Ideal Feature Extractor
Ideal Feature Extractor
- club, angle = 90- man, frontal pose...
- club, angle = 360- man, side pose...
Ideal Feature Extractor
- club, angle = 270- man, frontal pose...
I 1
I 2Ranzato
![Page 11: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/11.jpg)
11
The Manifold of Natural Images
![Page 12: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/12.jpg)
12
The Manifold of Natural Images
We need to linearize the manifold: learn non-linear features!
Ranzato
![Page 13: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/13.jpg)
13
Ideal Feature Extraction
Pixel 1
Pixel 2
Pixel n
Expression
Pose
Ideal Feature Extractor
Ranzato
![Page 14: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/14.jpg)
14
Learning Non-Linear Features
featuresf x ;
Q.: which class of non-linear functions shall we consider?
Ranzato
![Page 15: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/15.jpg)
15
Learning Non-Linear Features
Proposal #1: linear combination
Proposal #2: composition
Given a dictionary of simple non-linear functions: g1 , , g n
f x≈∑ jg j
f x≈g 1g2 gn x
+
Ranzato
![Page 16: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/16.jpg)
16
Learning Non-Linear Features
Proposal #1: linear combination
Proposal #2: composition
Given a dictionary of simple non-linear functions: g1 , , g n
f x≈g 1g2 gn x
Ranzato
Kernel learning Boosting ...
Deep learning Scattering networks (wavelet cascade) S.C. Zhou & D. Mumford “grammar”
S h a l l o w
D e e p
f x≈∑ jg j
![Page 17: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/17.jpg)
17
Linear Combination
Ranzato
+
...
Input image
templete matchers
prediction of class
BAD: it may require an exponential nr. of
templates!!!
![Page 18: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/18.jpg)
18
Composition
RanzatoInput image
low level parts
prediction of class
GOOD: (exponentially) more efficient
mid-level parts
high-level parts
reuse of intermediate parts distributed representations
Lee et al. “Convolutional DBN's ...” ICML 2009
![Page 19: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/19.jpg)
19
The Big Advantage of Deep Learning
Efficiency: intermediate concepts can be re-used
RanzatoZeiler, Fergus 2013
![Page 20: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/20.jpg)
20
The Big Advantage of Deep Learning
Efficiency: intermediate concepts can be re-used
RanzatoZeiler, Fergus 2013
![Page 21: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/21.jpg)
21
The Big Advantage of Deep Learning
Efficiency: intermediate concepts can be re-used
Zeiler, Fergus 2013
![Page 22: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/22.jpg)
22
A Potential Problem with Deep Learning
Optimization is difficult: non-convex, non-linear system
1 2 3 4
Ranzato
![Page 23: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/23.jpg)
23
A Potential Problem with Deep Learning
Optimization is difficult: non-convex, non-linear system
4
Solution #1: freeze first N-1 layer (engineer the features) It makes it shallow!
Ranzato
SIFT
k-Me ans
Pool ing
Clas sifie r
![Page 24: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/24.jpg)
24
A Potential Problem with Deep Learning
Optimization is difficult: non-convex, non-linear system
4Solution #2: live with it!
It will converge to a local minimum. It is much more powerful!!
1 2 3
RanzatoGiven lots of data, engineer less and learn more!!
![Page 25: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/25.jpg)
25
Deep Learning in Practice
Optimization is easy, need to know a few tricks of the trade.
4
Q: What's the feature extractor? And what's the classifier?
1 2 3
A: No distinction, end-to-end learning!
Ranzato
![Page 26: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/26.jpg)
26
Deep Learning in Practice
It works very well in practice:
Ranzato
![Page 27: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/27.jpg)
27
KEY IDEAS: WHY DEEP LEARNING
We need non-linear system
We need to learn it from data
Build feature hierarchies (function composition)
End-to-end learning
Ranzato
![Page 28: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/28.jpg)
28
Outline
Motivation
Deep Learning: The Big Picture
From neural nets to convolutional nets
Applications
A practical guide
Ranzato
![Page 29: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/29.jpg)
29
Outline
Motivation
Deep Learning: The Big Picture
From neural nets to convolutional nets
Applications
A practical guide
Ranzato
![Page 30: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/30.jpg)
30
What Is Deep Learning?
Ranzato
![Page 31: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/31.jpg)
31
Buzz Words
It's a Contrastive Divergence
It's a Convolutional Net
It's just old Neural Nets
It's a Feature Learning
It's a Deep Belief Net
It's a Unsupervised Learning
Ranzato
![Page 32: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/32.jpg)
32
(My) Definition
A Deep Learning method is: a method which makes predictions by using a sequence of non-linear processing stages. The resulting intermediate representations can be interpreted as feature hierarchies and the whole system is jointly learned from data.
Some deep learning methods are probabilistic, others are loss-based, some are supervised, other unsupervised...
It's a large family!
Ranzato
![Page 33: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/33.jpg)
33
Perceptron
1957
THE SPACE OF MACHINE LEARNING METHODS
![Page 34: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/34.jpg)
34
Perceptron
1957 Neural Net '80s
AutoEncoders
BM
![Page 35: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/35.jpg)
35
Perceptron
Neural Net
Conv. Net
Boosting
'90s – early '00s DecisionTree
SVM
GMM
AutoEncoders
Sparse Coding
1957 '80s
BM
![Page 36: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/36.jpg)
36
Perceptron
Neural Net
Conv. Net
Boosting
DecisionTree
SVM
GMM
AutoEncoders
DBNRBM
Sparse Coding
2006
1957 '80s
BM
![Page 37: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/37.jpg)
37
Perceptron
Neural Net
Conv. Net
Boosting
DecisionTree
SVM
GMM
AutoEncoders
DBNRBM
Sparse Coding
2006
20091957
ΣΠBayesNP
DBM
D-AE
BM
![Page 38: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/38.jpg)
38
Perceptron
Neural Net
Conv. Net
Boosting
SVM
GMM
AutoEncoders
DBNRBM
Sparse Coding
2006
20091957
2012
ΣΠBayesNP
D-AE
DBM
DecisionTree
BM
![Page 39: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/39.jpg)
39
Perceptron
Neural Net
Conv. Net
Boosting
DecisionTree
SVM
GMM
AutoEncoders
DBNRBM
Sparse Coding
ΣΠBayesNP
DBM
D-AE
DEEPSHALLOW
BM
![Page 40: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/40.jpg)
40
Perceptron
Neural Net
Conv. Net
Boosting
SVM
GMM
AutoEncoders
DBNRBM
Sparse Coding
ΣΠBayesNP
DBM
D-AE
DEEPSHALLOW
Probabilistic Models
Neural Networks
BM
![Page 41: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/41.jpg)
41
Perceptron
Neural Net
Conv. Net
Boosting
SVM
GMM
AutoEncoders
DBNRBM
Sparse Coding
ΣΠBayesNP
DBM
D-AE
DEEPSHALLOW
Probabilistic Models
Neural Networks
SupervisedUnsupervised
Supervised
BM
![Page 42: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/42.jpg)
42
In this talk, we'll focus on convolutional networks.
Ranzato
![Page 43: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/43.jpg)
43
Outline
Motivation
Deep Learning: The Big Picture
From neural nets to convolutional nets
Applications
A practical guide
RanzatoRanzato
![Page 44: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/44.jpg)
44
Linear Classifier: SVMInput:
Binary label:
Parameters:
Output prediction:
Loss:
x∈RD
y∈{−1,1 }
w∈RD
wT x
L=12∥w∥
2max [0,1−w
Tx y ]
L
wT x y
Hinge Loss
1 Ranzato
![Page 45: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/45.jpg)
45
Linear Classifier: Logistic RegressionInput:
Binary label:
Parameters:
Output prediction:
Loss:
x∈RD
y∈{−1,1 }
w∈RD
p y=1∣x =1
1e−wT x
L=12∥w∥
2 log 1exp −w
Tx y
L
wT x y1
Log Loss
wT x
1
Ranzato
![Page 46: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/46.jpg)
46
Linear Classifier: Logistic RegressionInput:
Binary label:
Parameters:
Output prediction:
Loss:
x∈RD
y∈{−1,1 }
w∈RD
p y=1∣x =1
1e−wT x
L=12∥w∥
2− log p y∣x
L
wT x y1
Log Loss
wT x
1
Ranzato
![Page 47: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/47.jpg)
47
Graphical Representation
Ranzato
wT x
1
wTx output
x1x2x3x 4
outputoutputx
ww1w2w3w 4
![Page 48: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/48.jpg)
48
Graphical Representation
Ranzato
x1x2x3x 4
outputoutputx
w
wTx output
w1w2w3w 4
![Page 49: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/49.jpg)
49
From Logistic Regression To Neural Nets
Ranzato
![Page 50: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/50.jpg)
50
From Logistic Regression To Neural Nets
Ranzato
![Page 51: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/51.jpg)
51
From Logistic Regression To Neural Nets
Ranzato
![Page 52: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/52.jpg)
52
wT x
1
Ranzato
Neural Network
hidden unit or feature
outputoutput
inputs
output
weights
2 hidden layer neural network(4 layer neural network)
activation function
![Page 53: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/53.jpg)
53
Learning Non-Linear Features
Proposal #1:
Proposal #2:
+
Ranzato
Each of box is a feature detector
![Page 54: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/54.jpg)
54
Neural Nets
NOTE: In practice, each module does NOT need to be a logistic regression classifier. Any (a.e. differentiable) non-linear transformation is potentially good.
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
y
![Page 55: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/55.jpg)
55
Forward Propagation (FPROP)
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
1) Given compute:x h1= f 1 x ;1
y
![Page 56: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/56.jpg)
56
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
1) Given compute:x h1= f 1 x ;1
h1=max0,W 1 xb1For instance,
y
Forward Propagation (FPROP)
![Page 57: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/57.jpg)
57
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
1) Given compute:x h1= f 1 x ;1
2) Given compute:h1 h2= f 2h1 ;2
y
Forward Propagation (FPROP)
![Page 58: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/58.jpg)
58
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
1) Given compute:x h1= f 1 x ;1
2) Given compute:h1 h2= f 2h1 ;2
3) Given compute:h2 y= f 3h2 ;3
y
Forward Propagation (FPROP)
![Page 59: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/59.jpg)
59
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
1) Given compute:x h1= f 1 x ;1
2) Given compute:h1 h2= f 2h1 ;2
3) Given compute:h2
y i= pclass=i∣x =eW 3 ih2b 3 i
∑keW 3 k h2b3 k
y
y= f 3h2 ;3
For instance,
Forward Propagation (FPROP)
![Page 60: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/60.jpg)
60
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
1) Given compute:x h1= f 1 x ;1
2) Given compute:h1 h2= f 2h1 ;2
3) Given compute:h2
This is the typical processing at test time.
At training time, we need to compute an error measure and tune the parameters to decrease the error.
y
y= f 3h2 ;3
Forward Propagation (FPROP)
![Page 61: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/61.jpg)
61
Loss
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
The measure of how well the model fits the training set is given by a suitable loss function:
The loss depends on the input , the target label , and the parameters .
y
Lossy
L x , y ;
x y
![Page 62: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/62.jpg)
62
Loss
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
The measure of how well the model fits the training set is given by a suitable loss function:
For instance,
Lossy
L x , y ;
L x , y=k ; =− log pclass=k∣x
y
![Page 63: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/63.jpg)
63
Loss
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
Q.: how to tune the parameters to decrease the loss?
Lossy
If loss is (a.e.) differentiable we can compute gradients.
We can use chain-rule, a.k.a. back-propagation, to compute the gradients w.r.t. parameters at the lower layers.
Rumelhart et al. “Learning internal representations by back-propagating..” Nature 1986
y
![Page 64: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/64.jpg)
64
Backward Propagation (BPROP)
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
Lossy
Given and assumiing the Jacobian of each module is
easy to compute, then we have:
∂L∂ y
∂ L∂h2
=∂ L∂ y
∂ y∂h2
∂ L∂3
=∂L∂ y
∂ y∂3
∂L∂ y
![Page 65: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/65.jpg)
65
h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
Lossy
Given and assumiing the Jacobian of each module is
easy to compute, then we have:
∂L∂ y
∂ L∂h2
= y− y 3 '∂ L∂3
= y− y h2 '
∂L∂ y
Backward Propagation (BPROP)
![Page 66: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/66.jpg)
66
h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
Lossy
Given we can compute now:∂ L∂h2
∂ L∂h1
=∂ L∂h2
∂ h2∂h1
∂ L∂2
=∂ L∂h2
∂h2∂2
∂ L∂h2
∂L∂ y
Backward Propagation (BPROP)
![Page 67: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/67.jpg)
67
xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3
Lossy
Given we can compute now:∂ L∂h1
∂ L∂1
=∂ L∂h1
∂h1∂1
∂ L∂h2
∂ L∂h1
∂L∂ y
Backward Propagation (BPROP)
![Page 68: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/68.jpg)
68
Optimization
Stochastic Gradient Descent (on mini-batches):
−∂ L∂,∈R
Stochastic Gradient Descent with Momentum:
0.9∂L∂
−
Schaul et al. “No more pesky learning rates” ICML 2013Sutskever et al. “On the importance of initialization and momentum...” ICML 2013
LeCun et al. “Efficient BackProp” Neural Networks: Tricks of the trade 1998
![Page 69: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/69.jpg)
69
Toy Code: Neural Net Trainer% F-PROPfor i = 1 : nr_layers - 1 [h{i} jac{i}] = nonlinearity(W{i} * h{i-1} + b{i});endh{nr_layers-1} = W{nr_layers-1} * h{nr_layers-2} + b{nr_layers-1};prediction = softmax(h{l-1});
% CROSS ENTROPY LOSSloss = - sum(sum(log(prediction) .* target)) / batch_size;
% B-PROPdh{l-1} = prediction - target;for i = nr_layers – 1 : -1 : 1 Wgrad{i} = dh{i} * h{i-1}'; bgrad{i} = sum(dh{i}, 2); dh{i-1} = (W{i}' * dh{i}) .* jac{i-1}; end
% UPDATEfor i = 1 : nr_layers - 1 W{i} = W{i} – (lr / batch_size) * Wgrad{i}; b{i} = b{i} – (lr / batch_size) * bgrad{i}; end
Ranzato
![Page 70: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/70.jpg)
70
KEY IDEAS: Training NNets
Neural Net = stack of feature detectors
F-Prop / B-Prop
Learning by SGD
Ranzato
![Page 71: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/71.jpg)
71
Example: 1000x1000 image 1M hidden units
10^12 parameters!!!
- Spatial correlation is local- Better to put resources elsewhere!
FULLY CONNECTED NEURAL NET
Ranzato
![Page 72: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/72.jpg)
72
LOCALLY CONNECTED NEURAL NET
Example: 1000x1000 image 1M hidden units Filter size: 10x10
100M parameters
Ranzato
Filter/Kernel/Receptive field: input patch which the hidden unit is connected to.
![Page 73: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/73.jpg)
73
STATIONARITY? Statistics are similar at different locations
(translation invariance)
Example: 1000x1000 image 1M hidden units Filter size: 10x10
100M parameters
LOCALLY CONNECTED NEURAL NET
Ranzato
![Page 74: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/74.jpg)
74
CONVOLUTIONAL NET
Share the same parameters across different locations:Convolutions with learned kernels
Ranzato
![Page 75: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/75.jpg)
75
Learn multiple filters.
E.g.: 1000x1000 image 100 Filters Filter size: 10x10
10K parameters
CONVOLUTIONAL NET
Ranzato
![Page 76: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/76.jpg)
76
CONVOLUTIONAL NET
Ranzato
featu
re m
ap
hidden unit /filter response
![Page 77: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/77.jpg)
77
CONVOLUTIONAL LAYER
RanzatoInput feature maps
output feature map
3D kernel(filter)
![Page 78: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/78.jpg)
78
CONVOLUTIONAL LAYER
Ranzato
Input feature maps
output feature maps
many 3D kernes (filters)
NOTE: the nr. of output feature maps isusually larger than the nr. of input feature maps
![Page 79: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/79.jpg)
79
CONVOLUTIONAL LAYER
Ranzato
input feature maps output feature maps
Convolutional Layer
NOTE: the nr. of output feature maps isusually larger than the nr. of input feature maps
![Page 80: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/80.jpg)
80
KEY IDEAS: CONV. NETS
A standard neural net applied to images:- scales quadratically with the size of the input- does not leverage stationarity
Solution:- connect each hidden unit to a small patch of the input- share the weight across hidden units
This is called: convolutional network.LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998
Ranzato
![Page 81: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/81.jpg)
81
SPECIAL LAYERSOver the years, some new modules have proven to be very effective when plugged into conv-nets:
- Pooling (average, L2, max)
- Local Contrast Normalization (over space / features)
hi1, x , y=max j , k ∈N x , y hi , j , k
hi1, x , y=hi , x , y−mi , x , y
i , x , y
layer i1layer i
x , yN x , y
layer i1layer i
x , yN x , y
Jarrett et al. “What is the best multi-stage architecture...?” ICCV 2009 Ranzato
![Page 82: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/82.jpg)
82
Let us assume filter is an “eye” detector.
Q.: how can we make the detection robust to the exact location of the eye?
POOLING
Ranzato
![Page 83: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/83.jpg)
83
By “pooling” (e.g., taking max) filterresponses at different locations we gainrobustness to the exact spatial locationof features.
POOLING
Ranzato
hi1, x , y=max j , k ∈N x , y hi , j , k
![Page 84: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/84.jpg)
84
POOLING LAYER
RanzatoInput feature maps output feature maps
NOTE: 1) the nr. of output feature maps is the same as the nr. of input feature maps2) spatial resolution is reduced – patch collapsed into one value – use of stride > 1
![Page 85: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/85.jpg)
85
POOLING LAYER
Ranzato
NOTE: 1) the nr. of output feature maps is the same as the nr. of input feature maps2) spatial resolution is reduced – patch collapsed into one value – use of stride > 1
input feature mapsoutput feature maps
Pooling Layer
![Page 86: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/86.jpg)
86
LOCAL CONTRAST NORMALIZATION
h i1, x , y=hi , x , y−mi , N x , y
i , N x , y
Ranzato
![Page 87: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/87.jpg)
87
LOCAL CONTRAST NORMALIZATION
h i1, x , y=hi , x , y−mi , N x , y
i , N x , y
We want the same response.
Ranzato
![Page 88: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/88.jpg)
88
LOCAL CONTRAST NORMALIZATION
h i1, x , y=hi , x , y−mi , N x , y
i , N x , y
Performed also across features and in the higher layers.
Effects:– improves invariance– improves optimization– increases sparsity
Ranzato
![Page 89: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/89.jpg)
89
CONV NETS: TYPICAL ARCHITECTURE
Convol. LCN Pooling
One stage (zoom)
Ranzato
Convol. LCN Pooling
Convolutional layer increases nr. feature maps.Pooling layer decreases spatial resolution.
![Page 90: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/90.jpg)
90
CONV NETS: TYPICAL ARCHITECTURE
Convol. LCN Pooling
One stage (zoom)
Ranzato
Convol.LCN Pooling
Example with only two filters.
![Page 91: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/91.jpg)
91
CONV NETS: TYPICAL ARCHITECTURE
Convol. LCN Pooling
One stage (zoom)
Ranzato
Convol.LCN Pooling
A hidden unit in the first hidden layer is influenced by a small neighborhood (equal to size of filter).
![Page 92: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/92.jpg)
92
CONV NETS: TYPICAL ARCHITECTURE
Convol. LCN Pooling
One stage (zoom)
Ranzato
Convol.LCN Pooling
A hidden unit after the pooling layer is influenced by a larger neighborhood (it depends on filter sizes and strides).
![Page 93: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/93.jpg)
93
CONV NETS: TYPICAL ARCHITECTURE
Convol. LCN Pooling
One stage (zoom)
Fully Conn. Layers
Whole system
1st stage 2nd stage 3rd stage
Input Image
ClassLabels
Ranzato
After a few stages, residual spatial resolution is very small. We have learned a descriptor for the whole image.
![Page 94: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/94.jpg)
94
CONV NETS: TYPICAL ARCHITECTURE
Convol. LCN Pooling
One stage (zoom)
Ranzato
SIFT → K-Means → Pyramid Pooling → SVM
SIFT → Fisher Vect. → Pooling → SVM
Lazebnik et al. “...Spatial Pyramid Matching...” CVPR 2006
Sanchez et al. “Image classifcation with F.V.: Theory and practice” IJCV 2012
Conceptually similar to:
![Page 95: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/95.jpg)
95
CONV NETS: TRAINING
Algorithm:Given a small mini-batch- F-PROP- B-PROP- PARAMETER UPDATE
All layers are differentiable (a.e.). We can use standard back-propagation.
Ranzato
![Page 96: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/96.jpg)
96
KEY IDEAS: CONV. NETS
Conv. Nets have special layers like:– pooling, and– local contrast normalizationBack-propagation can still be applied.
These layers are useful to:– reduce computational burden– increase invariance– ease the optimization
Ranzato
![Page 97: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/97.jpg)
97
Outline
Motivation
Deep Learning: The Big Picture
From neural nets to convolutional nets
Applications
A practical guide
Ranzato
![Page 98: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/98.jpg)
98
CONV NETS: EXAMPLES- OCR / House number & Traffic sign classification
Ciresan et al. “MCDNN for image classification” CVPR 2012Wan et al. “Regularization of neural networks using dropconnect” ICML 2013
![Page 99: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/99.jpg)
99
CONV NETS: EXAMPLES- Texture classification
Sifre et al. “Rotation, scaling and deformation invariant scattering...” CVPR 2013
![Page 100: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/100.jpg)
100
CONV NETS: EXAMPLES- Pedestrian detection
Sermanet et al. “Pedestrian detection with unsupervised multi-stage..” CVPR 2013
![Page 101: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/101.jpg)
101
CONV NETS: EXAMPLES- Scene Parsing
Farabet et al. “Learning hierarchical features for scene labeling” PAMI 2013
![Page 102: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/102.jpg)
102
CONV NETS: EXAMPLES- Segmentation 3D volumetric images
Ciresan et al. “DNN segment neuronal membranes...” NIPS 2012Turaga et al. “Maximin learning of image segmentation” NIPS 2009
![Page 103: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/103.jpg)
103
CONV NETS: EXAMPLES- Action recognition from videos
Taylor et al. “Convolutional learning of spatio-temporal features” ECCV 2010
![Page 104: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/104.jpg)
104
CONV NETS: EXAMPLES- Robotics
Sermanet et al. “Mapping and planning ...with long range perception” IROS 2008
![Page 105: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/105.jpg)
105
CONV NETS: EXAMPLES- Denoising
Burger et al. “Can plain NNs compete with BM3D?” CVPR 2012
original noised denoised
![Page 106: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/106.jpg)
106
CONV NETS: EXAMPLES- Dimensionality reduction / learning embeddings
Hadsell et al. “Dimensionality reduction by learning an invariant mapping” CVPR 2006
![Page 107: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/107.jpg)
107
CONV NETS: EXAMPLES- Deployed in commercial systems (Google & Baidu, spring 2013)
![Page 108: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/108.jpg)
108
CONV NETS: EXAMPLES- Image classification
Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
ObjectRecognizer railcar
![Page 109: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/109.jpg)
109
Architecture
CONV
LOCAL CONTRAST NORM
MAX POOLING
FULLY CONNECTED
LINEAR
CONV
LOCAL CONTRAST NORM
MAX POOLING
CONV
CONV
CONV
MAX POOLING
FULLY CONNECTED
RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
category prediction
input
![Page 110: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/110.jpg)
110
Architecture
CONV
LOCAL CONTRAST NORM
MAX POOLING
FULLY CONNECTED
LINEAR
CONV
LOCAL CONTRAST NORM
MAX POOLING
CONV
CONV
CONV
MAX POOLING
FULLY CONNECTED
Total nr. params: 60M
4M
16M
37M
442K
1.3M
884K
307K
35K
RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
category prediction
input
![Page 111: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/111.jpg)
111
Architecture
CONV
LOCAL CONTRAST NORM
MAX POOLING
FULLY CONNECTED
LINEAR
CONV
LOCAL CONTRAST NORM
MAX POOLING
CONV
CONV
CONV
MAX POOLING
FULLY CONNECTED
Total nr. params: 60M
4M
16M
37M
442K
1.3M
884K
307K
35K
Total nr. flops: 832M
4M
16M37M
74M
224M
149M
223M
105M
RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
category prediction
input
![Page 112: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/112.jpg)
112
Optimization
SGD with momentum:
Learning rate = 0.01
Momentum = 0.9
Improving generalization by:
Weight sharing (convolution)
Input distortions
Dropout = 0.5
Weight decay = 0.0005
Ranzato
![Page 113: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/113.jpg)
113
Results: ILSVRC 2012
Ranzato
![Page 114: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/114.jpg)
114
Results: ILSVRC 2012
Ranzato
![Page 115: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/115.jpg)
115
Results
First layer learned filters (processing raw pixel values).
RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012
![Page 116: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/116.jpg)
116
![Page 117: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/117.jpg)
117
TEST IMAGE RETRIEVED IMAGES
![Page 118: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/118.jpg)
118
Outline
Motivation
Deep Learning: The Big Picture
From neural nets to convolutional nets
Applications
A practical guide
Ranzato
![Page 119: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/119.jpg)
119
CHOOSING THE ARCHITECTURE
[Convolution → LCN → pooling]* + fully connected layer
Cross-validation
Task dependent
The more data: the more layers and the more kernelsLook at the number of parameters at each layerLook at the number of flops at each layer
Computational cost
Be creative :)Ranzato
![Page 120: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/120.jpg)
120
HOW TO OPTIMIZE
SGD (with momentum) usually works very well
Pick learning rate by running on a subset of the dataBottou “Stochastic Gradient Tricks” Neural Networks 2012Start with large learning rate and divide by 2 until loss does not divergeDecay learning rate by a factor of ~100 or more by the end of training
Use non-linearity
Initialize parameters so that each feature across layers has similar variance. Avoid units in saturation.
Ranzato
![Page 121: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/121.jpg)
121
HOW TO IMPROVE GENERALIZATION
Weight sharing (greatly reduce the number of parameters)
Data augmentation (e.g., jittering, noise injection, etc.)
Dropout Hinton et al. “Improving Nns by preventing co-adaptation of feature detectors” arxiv 2012
Weight decay (L2, L1)
Sparsity in the hidden units
Multi-task (unsupervised learning)
Ranzato
![Page 122: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/122.jpg)
122
OTHER THINGS GOOD TO KNOW
Check gradients numerically by finite differences
Visualize features (feature maps need to be uncorrelated) and have high variance.
Ranzato
sam
p les
hidden unitGood training: hidden units are sparse across samples and across features.
![Page 123: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/123.jpg)
123
OTHER THINGS GOOD TO KNOW
Check gradients numerically by finite differences
Visualize features (feature maps need to be uncorrelated) and have high variance.
Ranzato
sam
p les
hidden unitBad training: many hidden units ignore the input and/or exhibit strong correlations.
![Page 124: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/124.jpg)
124
OTHER THINGS GOOD TO KNOW
Check gradients numerically by finite differences
Visualize features (feature maps need to be uncorrelated) and have high variance.
Visualize parameters
Good training: learned filters exhibit structure and are uncorrelated.
GOOD BADBAD BAD
too noisy too correlated lack structure
Ranzato
![Page 125: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/125.jpg)
125
OTHER THINGS GOOD TO KNOW
Check gradients numerically by finite differences
Visualize features (feature maps need to be uncorrelated) and have high variance.
Visualize parameters
Measure error on both training and validation set.
Test on a small subset of the data and check the error → 0.
Ranzato
![Page 126: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/126.jpg)
126
WHAT IF IT DOES NOT WORK?
Training diverges:Learning rate may be too large → decrease learning rateBPROP is buggy → numerical gradient checking
Parameters collapse / loss is minimized but accuracy is low Check loss function:
Is it appropriate for the task you want to solve?Does it have degenerate solutions?
Network is underperformingCompute flops and nr. params. → if too small, make net largerVisualize hidden units/params → fix optmization
Network is too slowCompute flops and nr. params. → GPU,distrib. framework, make net smaller
Ranzato
![Page 127: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/127.jpg)
127
FUTURE CHALLENGES Scalability
HardwareGPU / distributed frameworks
AlgorithmsBetter lossesBetter optimizers
Learning better representationsVideoUnsupervised learningMulti-task learning
Feedback at training and inference time
Structure prediction
Black-box tool (hyper-parameters optimization)RanzatoSnoek et al. “Practical Bayesian optimization of ML algorithms” NIPS 2012
![Page 128: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/128.jpg)
128
SUMMARY
Ranzato
Want to efficiently learn non-linear adaptive hierarchical systems
End-to-end learning
Gradient-based learning
Adapting neural nets to vision:Weight sharingPooling and Contrast Normalization
Improving generalization on small datasets:Weight decay, dropout, sparsity, multi-task
Training a convnet means:Design architectureDesign loss functionOptimization (SGD)
Very successful (large-scale) applications
![Page 129: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/129.jpg)
129
SOFTWARETorch7: learning library that supports neural net traininghttp://www.torch.chhttp://code.cogbits.com/wiki/doku.php (tutorial with demos by C. Farabet)
Python-based learning library (U. Montreal)
- http://deeplearning.net/software/theano/ (does automatic differentiation)
C++ code for ConvNets (Sermanet)
– http://eblearn.sourceforge.net/
Efficient CUDA kernels for ConvNets (Krizhevsky)
– code.google.com/p/cuda-convnet
Ranzato
![Page 130: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/130.jpg)
130
REFERENCESConvolutional Nets– LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998
- Krizhevsky, Sutskever, Hinton “ImageNet Classification with deep convolutional neural networks” NIPS 2012
– Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage Architecture for Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09), IEEE, 2009
- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional Feature Hierachies for Visual Recognition, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010
– see yann.lecun.com/exdb/publis for references on many different kinds of convnets.
– see http://www.cmap.polytechnique.fr/scattering/ for scattering networks (similar to convnets but with less learning and stronger mathematical foundations)
Ranzato
![Page 131: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/131.jpg)
131
REFERENCESApplications of Convolutional Nets
– Farabet, Couprie, Najman, LeCun. Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, ICML 2012
– Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala and Yann LeCun: Pedestrian Detection with Unsupervised Multi-Stage Feature Learning, CVPR 2013
- D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. NIPS 2012
- Raia Hadsell, Pierre Sermanet, Marco Scoffier, Ayse Erkan, Koray Kavackuoglu, Urs Muller and Yann LeCun. Learning Long-Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, 2009
– Burger, Schuler, Harmeling. Image Denoisng: Can Plain Neural Networks Compete with BM3D?, CVPR 2012
– Hadsell, Chopra, LeCun. Dimensionality reduction by learning an invariant mapping, CVPR 2006
– Bergstra et al. Making a science of model search: hyperparameter optimization in hundred of dimensions for vision architectures, ICML 2013 Ranzato
![Page 132: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/132.jpg)
132
REFERENCESDeep Learning in general
– deep learning tutorial slides at ICML 2013
– Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009.
– LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006
Ranzato
![Page 133: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/133.jpg)
133
ACKNOWLEDGEMENTS
Ranzato
Yann LeCun - NYU
Alex Krizhevsky - Google
Jeff Dean - Google
![Page 134: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com](https://reader031.vdocuments.us/reader031/viewer/2022031005/5b88e9ba7f8b9abe1e8c52a0/html5/thumbnails/134.jpg)
134
THANK YOU!
Ranzato