large-scale visual recognition with deep …ranzato/publications/ranzato_cvpr13.pdflarge-scale...

134
Large-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato [email protected] www.cs.toronto.edu/~ranzato

Upload: doanh

Post on 31-Aug-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

Large-Scale Visual RecognitionWith Deep Learning

Sunday 23 June 2013

Marc'Aurelio Ranzato

[email protected]/~ranzato

Page 2: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

2

Why Is Recognition Hard?

ObjectRecognizer panda

Ranzato

Page 3: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

3

Why Is Recognition Hard?

ObjectRecognizer panda

Pose

Ranzato

Page 4: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

4

Why Is Recognition Hard?

ObjectRecognizer panda

Occlusion

Ranzato

Page 5: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

5

Why Is Recognition Hard?

ObjectRecognizer panda

Multiple objects

Ranzato

Page 6: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

6

Why Is Recognition Hard?

ObjectRecognizer panda

Inter-classsimilarity

Ranzato

Page 7: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

7

Ideal Features

Ideal Feature Extractor

- window, top-left- clock, top-middle- shelf, left- drawing,middle- statue, bottom left- …

- hat, bottom right

Q.: What objects are in the image? Where is the clock? What is on the top of the table? ... Ranzato

Page 8: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

8

Ideal Features Are Non-Linear

Ideal Feature Extractor

Ideal Feature Extractor

- club, angle = 90- man, frontal pose...

- club, angle = 360- man, side pose...

Ideal Feature Extractor

- club, angle = 270- man, frontal pose...

?

I 1

I 2Ranzato

Page 9: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

9

Ideal Features Are Non-Linear

Ideal Feature Extractor

Ideal Feature Extractor

- club, angle = 90- man, frontal pose...

- club, angle = 360- man, side pose...

Ideal Feature Extractor

- club, angle = 270- man, frontal pose...

I 1

I 2Ranzato

INPUT IS NOT THE

AVERAGE!

Page 10: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

10

Ideal Features Are Non-Linear

Ideal Feature Extractor

Ideal Feature Extractor

- club, angle = 90- man, frontal pose...

- club, angle = 360- man, side pose...

Ideal Feature Extractor

- club, angle = 270- man, frontal pose...

I 1

I 2Ranzato

Page 11: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

11

The Manifold of Natural Images

Page 12: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

12

The Manifold of Natural Images

We need to linearize the manifold: learn non-linear features!

Ranzato

Page 13: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

13

Ideal Feature Extraction

Pixel 1

Pixel 2

Pixel n

Expression

Pose

Ideal Feature Extractor

Ranzato

Page 14: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

14

Learning Non-Linear Features

featuresf x ;

Q.: which class of non-linear functions shall we consider?

Ranzato

Page 15: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

15

Learning Non-Linear Features

Proposal #1: linear combination

Proposal #2: composition

Given a dictionary of simple non-linear functions: g1 , , g n

f x≈∑ jg j

f x≈g 1g2 gn x

+

Ranzato

Page 16: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

16

Learning Non-Linear Features

Proposal #1: linear combination

Proposal #2: composition

Given a dictionary of simple non-linear functions: g1 , , g n

f x≈g 1g2 gn x

Ranzato

Kernel learning Boosting ...

Deep learning Scattering networks (wavelet cascade) S.C. Zhou & D. Mumford “grammar”

S h a l l o w

D e e p

f x≈∑ jg j

Page 17: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

17

Linear Combination

Ranzato

+

...

Input image

templete matchers

prediction of class

BAD: it may require an exponential nr. of

templates!!!

Page 18: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

18

Composition

RanzatoInput image

low level parts

prediction of class

GOOD: (exponentially) more efficient

mid-level parts

high-level parts

reuse of intermediate parts distributed representations

Lee et al. “Convolutional DBN's ...” ICML 2009

Page 19: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

19

The Big Advantage of Deep Learning

Efficiency: intermediate concepts can be re-used

RanzatoZeiler, Fergus 2013

Page 20: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

20

The Big Advantage of Deep Learning

Efficiency: intermediate concepts can be re-used

RanzatoZeiler, Fergus 2013

Page 21: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

21

The Big Advantage of Deep Learning

Efficiency: intermediate concepts can be re-used

Zeiler, Fergus 2013

Page 22: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

22

A Potential Problem with Deep Learning

Optimization is difficult: non-convex, non-linear system

1 2 3 4

Ranzato

Page 23: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

23

A Potential Problem with Deep Learning

Optimization is difficult: non-convex, non-linear system

4

Solution #1: freeze first N-1 layer (engineer the features) It makes it shallow!

Ranzato

SIFT

k-Me ans

Pool ing

Clas sifie r

Page 24: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

24

A Potential Problem with Deep Learning

Optimization is difficult: non-convex, non-linear system

4Solution #2: live with it!

It will converge to a local minimum. It is much more powerful!!

1 2 3

RanzatoGiven lots of data, engineer less and learn more!!

Page 25: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

25

Deep Learning in Practice

Optimization is easy, need to know a few tricks of the trade.

4

Q: What's the feature extractor? And what's the classifier?

1 2 3

A: No distinction, end-to-end learning!

Ranzato

Page 26: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

26

Deep Learning in Practice

It works very well in practice:

Ranzato

Page 27: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

27

KEY IDEAS: WHY DEEP LEARNING

We need non-linear system

We need to learn it from data

Build feature hierarchies (function composition)

End-to-end learning

Ranzato

Page 28: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

28

Outline

Motivation

Deep Learning: The Big Picture

From neural nets to convolutional nets

Applications

A practical guide

Ranzato

Page 29: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

29

Outline

Motivation

Deep Learning: The Big Picture

From neural nets to convolutional nets

Applications

A practical guide

Ranzato

Page 30: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

30

What Is Deep Learning?

Ranzato

Page 31: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

31

Buzz Words

It's a Contrastive Divergence

It's a Convolutional Net

It's just old Neural Nets

It's a Feature Learning

It's a Deep Belief Net

It's a Unsupervised Learning

Ranzato

Page 32: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

32

(My) Definition

A Deep Learning method is: a method which makes predictions by using a sequence of non-linear processing stages. The resulting intermediate representations can be interpreted as feature hierarchies and the whole system is jointly learned from data.

Some deep learning methods are probabilistic, others are loss-based, some are supervised, other unsupervised...

It's a large family!

Ranzato

Page 33: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

33

Perceptron

1957

THE SPACE OF MACHINE LEARNING METHODS

Page 34: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

34

Perceptron

1957 Neural Net '80s

AutoEncoders

BM

Page 35: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

35

Perceptron

Neural Net

Conv. Net

Boosting

'90s – early '00s DecisionTree

SVM

GMM

AutoEncoders

Sparse Coding

1957 '80s

BM

Page 36: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

36

Perceptron

Neural Net

Conv. Net

Boosting

DecisionTree

SVM

GMM

AutoEncoders

DBNRBM

Sparse Coding

2006

1957 '80s

BM

Page 37: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

37

Perceptron

Neural Net

Conv. Net

Boosting

DecisionTree

SVM

GMM

AutoEncoders

DBNRBM

Sparse Coding

2006

20091957

ΣΠBayesNP

DBM

D-AE

BM

Page 38: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

38

Perceptron

Neural Net

Conv. Net

Boosting

SVM

GMM

AutoEncoders

DBNRBM

Sparse Coding

2006

20091957

2012

ΣΠBayesNP

D-AE

DBM

DecisionTree

BM

Page 39: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

39

Perceptron

Neural Net

Conv. Net

Boosting

DecisionTree

SVM

GMM

AutoEncoders

DBNRBM

Sparse Coding

ΣΠBayesNP

DBM

D-AE

DEEPSHALLOW

BM

Page 40: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

40

Perceptron

Neural Net

Conv. Net

Boosting

SVM

GMM

AutoEncoders

DBNRBM

Sparse Coding

ΣΠBayesNP

DBM

D-AE

DEEPSHALLOW

Probabilistic Models

Neural Networks

BM

Page 41: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

41

Perceptron

Neural Net

Conv. Net

Boosting

SVM

GMM

AutoEncoders

DBNRBM

Sparse Coding

ΣΠBayesNP

DBM

D-AE

DEEPSHALLOW

Probabilistic Models

Neural Networks

SupervisedUnsupervised

Supervised

BM

Page 42: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

42

In this talk, we'll focus on convolutional networks.

Ranzato

Page 43: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

43

Outline

Motivation

Deep Learning: The Big Picture

From neural nets to convolutional nets

Applications

A practical guide

RanzatoRanzato

Page 44: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

44

Linear Classifier: SVMInput:

Binary label:

Parameters:

Output prediction:

Loss:

x∈RD

y∈{−1,1 }

w∈RD

wT x

L=12∥w∥

2max [0,1−w

Tx y ]

L

wT x y

Hinge Loss

1 Ranzato

Page 45: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

45

Linear Classifier: Logistic RegressionInput:

Binary label:

Parameters:

Output prediction:

Loss:

x∈RD

y∈{−1,1 }

w∈RD

p y=1∣x =1

1e−wT x

L=12∥w∥

2 log 1exp −w

Tx y

L

wT x y1

Log Loss

wT x

1

Ranzato

Page 46: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

46

Linear Classifier: Logistic RegressionInput:

Binary label:

Parameters:

Output prediction:

Loss:

x∈RD

y∈{−1,1 }

w∈RD

p y=1∣x =1

1e−wT x

L=12∥w∥

2− log p y∣x

L

wT x y1

Log Loss

wT x

1

Ranzato

Page 47: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

47

Graphical Representation

Ranzato

wT x

1

wTx output

x1x2x3x 4

outputoutputx

ww1w2w3w 4

Page 48: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

48

Graphical Representation

Ranzato

x1x2x3x 4

outputoutputx

w

wTx output

w1w2w3w 4

Page 49: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

49

From Logistic Regression To Neural Nets

Ranzato

Page 50: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

50

From Logistic Regression To Neural Nets

Ranzato

Page 51: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

51

From Logistic Regression To Neural Nets

Ranzato

Page 52: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

52

wT x

1

Ranzato

Neural Network

hidden unit or feature

outputoutput

inputs

output

weights

2 hidden layer neural network(4 layer neural network)

activation function

Page 53: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

53

Learning Non-Linear Features

Proposal #1:

Proposal #2:

+

Ranzato

Each of box is a feature detector

Page 54: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

54

Neural Nets

NOTE: In practice, each module does NOT need to be a logistic regression classifier. Any (a.e. differentiable) non-linear transformation is potentially good.

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

y

Page 55: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

55

Forward Propagation (FPROP)

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

1) Given compute:x h1= f 1 x ;1

y

Page 56: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

56

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

1) Given compute:x h1= f 1 x ;1

h1=max0,W 1 xb1For instance,

y

Forward Propagation (FPROP)

Page 57: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

57

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

1) Given compute:x h1= f 1 x ;1

2) Given compute:h1 h2= f 2h1 ;2

y

Forward Propagation (FPROP)

Page 58: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

58

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

1) Given compute:x h1= f 1 x ;1

2) Given compute:h1 h2= f 2h1 ;2

3) Given compute:h2 y= f 3h2 ;3

y

Forward Propagation (FPROP)

Page 59: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

59

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

1) Given compute:x h1= f 1 x ;1

2) Given compute:h1 h2= f 2h1 ;2

3) Given compute:h2

y i= pclass=i∣x =eW 3 ih2b 3 i

∑keW 3 k h2b3 k

y

y= f 3h2 ;3

For instance,

Forward Propagation (FPROP)

Page 60: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

60

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

1) Given compute:x h1= f 1 x ;1

2) Given compute:h1 h2= f 2h1 ;2

3) Given compute:h2

This is the typical processing at test time.

At training time, we need to compute an error measure and tune the parameters to decrease the error.

y

y= f 3h2 ;3

Forward Propagation (FPROP)

Page 61: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

61

Loss

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

The measure of how well the model fits the training set is given by a suitable loss function:

The loss depends on the input , the target label , and the parameters .

y

Lossy

L x , y ;

x y

Page 62: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

62

Loss

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

The measure of how well the model fits the training set is given by a suitable loss function:

For instance,

Lossy

L x , y ;

L x , y=k ; =− log pclass=k∣x

y

Page 63: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

63

Loss

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

Q.: how to tune the parameters to decrease the loss?

Lossy

If loss is (a.e.) differentiable we can compute gradients.

We can use chain-rule, a.k.a. back-propagation, to compute the gradients w.r.t. parameters at the lower layers.

Rumelhart et al. “Learning internal representations by back-propagating..” Nature 1986

y

Page 64: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

64

Backward Propagation (BPROP)

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

Lossy

Given and assumiing the Jacobian of each module is

easy to compute, then we have:

∂L∂ y

∂ L∂h2

=∂ L∂ y

∂ y∂h2

∂ L∂3

=∂L∂ y

∂ y∂3

∂L∂ y

Page 65: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

65

h2h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

Lossy

Given and assumiing the Jacobian of each module is

easy to compute, then we have:

∂L∂ y

∂ L∂h2

= y− y 3 '∂ L∂3

= y− y h2 '

∂L∂ y

Backward Propagation (BPROP)

Page 66: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

66

h1xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

Lossy

Given we can compute now:∂ L∂h2

∂ L∂h1

=∂ L∂h2

∂ h2∂h1

∂ L∂2

=∂ L∂h2

∂h2∂2

∂ L∂h2

∂L∂ y

Backward Propagation (BPROP)

Page 67: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

67

xf 1 x ;1 f 2 h1 ;2 f 3h2 ;3

Lossy

Given we can compute now:∂ L∂h1

∂ L∂1

=∂ L∂h1

∂h1∂1

∂ L∂h2

∂ L∂h1

∂L∂ y

Backward Propagation (BPROP)

Page 68: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

68

Optimization

Stochastic Gradient Descent (on mini-batches):

−∂ L∂,∈R

Stochastic Gradient Descent with Momentum:

0.9∂L∂

Schaul et al. “No more pesky learning rates” ICML 2013Sutskever et al. “On the importance of initialization and momentum...” ICML 2013

LeCun et al. “Efficient BackProp” Neural Networks: Tricks of the trade 1998

Page 69: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

69

Toy Code: Neural Net Trainer% F-PROPfor i = 1 : nr_layers - 1 [h{i} jac{i}] = nonlinearity(W{i} * h{i-1} + b{i});endh{nr_layers-1} = W{nr_layers-1} * h{nr_layers-2} + b{nr_layers-1};prediction = softmax(h{l-1});

% CROSS ENTROPY LOSSloss = - sum(sum(log(prediction) .* target)) / batch_size;

% B-PROPdh{l-1} = prediction - target;for i = nr_layers – 1 : -1 : 1 Wgrad{i} = dh{i} * h{i-1}'; bgrad{i} = sum(dh{i}, 2); dh{i-1} = (W{i}' * dh{i}) .* jac{i-1}; end

% UPDATEfor i = 1 : nr_layers - 1 W{i} = W{i} – (lr / batch_size) * Wgrad{i}; b{i} = b{i} – (lr / batch_size) * bgrad{i}; end

Ranzato

Page 70: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

70

KEY IDEAS: Training NNets

Neural Net = stack of feature detectors

F-Prop / B-Prop

Learning by SGD

Ranzato

Page 71: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

71

Example: 1000x1000 image 1M hidden units

10^12 parameters!!!

- Spatial correlation is local- Better to put resources elsewhere!

FULLY CONNECTED NEURAL NET

Ranzato

Page 72: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

72

LOCALLY CONNECTED NEURAL NET

Example: 1000x1000 image 1M hidden units Filter size: 10x10

100M parameters

Ranzato

Filter/Kernel/Receptive field: input patch which the hidden unit is connected to.

Page 73: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

73

STATIONARITY? Statistics are similar at different locations

(translation invariance)

Example: 1000x1000 image 1M hidden units Filter size: 10x10

100M parameters

LOCALLY CONNECTED NEURAL NET

Ranzato

Page 74: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

74

CONVOLUTIONAL NET

Share the same parameters across different locations:Convolutions with learned kernels

Ranzato

Page 75: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

75

Learn multiple filters.

E.g.: 1000x1000 image 100 Filters Filter size: 10x10

10K parameters

CONVOLUTIONAL NET

Ranzato

Page 76: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

76

CONVOLUTIONAL NET

Ranzato

featu

re m

ap

hidden unit /filter response

Page 77: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

77

CONVOLUTIONAL LAYER

RanzatoInput feature maps

output feature map

3D kernel(filter)

Page 78: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

78

CONVOLUTIONAL LAYER

Ranzato

Input feature maps

output feature maps

many 3D kernes (filters)

NOTE: the nr. of output feature maps isusually larger than the nr. of input feature maps

Page 79: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

79

CONVOLUTIONAL LAYER

Ranzato

input feature maps output feature maps

Convolutional Layer

NOTE: the nr. of output feature maps isusually larger than the nr. of input feature maps

Page 80: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

80

KEY IDEAS: CONV. NETS

A standard neural net applied to images:- scales quadratically with the size of the input- does not leverage stationarity

Solution:- connect each hidden unit to a small patch of the input- share the weight across hidden units

This is called: convolutional network.LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998

Ranzato

Page 81: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

81

SPECIAL LAYERSOver the years, some new modules have proven to be very effective when plugged into conv-nets:

- Pooling (average, L2, max)

- Local Contrast Normalization (over space / features)

hi1, x , y=max j , k ∈N x , y hi , j , k

hi1, x , y=hi , x , y−mi , x , y

i , x , y

layer i1layer i

x , yN x , y

layer i1layer i

x , yN x , y

Jarrett et al. “What is the best multi-stage architecture...?” ICCV 2009 Ranzato

Page 82: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

82

Let us assume filter is an “eye” detector.

Q.: how can we make the detection robust to the exact location of the eye?

POOLING

Ranzato

Page 83: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

83

By “pooling” (e.g., taking max) filterresponses at different locations we gainrobustness to the exact spatial locationof features.

POOLING

Ranzato

hi1, x , y=max j , k ∈N x , y hi , j , k

Page 84: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

84

POOLING LAYER

RanzatoInput feature maps output feature maps

NOTE: 1) the nr. of output feature maps is the same as the nr. of input feature maps2) spatial resolution is reduced – patch collapsed into one value – use of stride > 1

Page 85: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

85

POOLING LAYER

Ranzato

NOTE: 1) the nr. of output feature maps is the same as the nr. of input feature maps2) spatial resolution is reduced – patch collapsed into one value – use of stride > 1

input feature mapsoutput feature maps

Pooling Layer

Page 86: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

86

LOCAL CONTRAST NORMALIZATION

h i1, x , y=hi , x , y−mi , N x , y

i , N x , y

Ranzato

Page 87: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

87

LOCAL CONTRAST NORMALIZATION

h i1, x , y=hi , x , y−mi , N x , y

i , N x , y

We want the same response.

Ranzato

Page 88: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

88

LOCAL CONTRAST NORMALIZATION

h i1, x , y=hi , x , y−mi , N x , y

i , N x , y

Performed also across features and in the higher layers.

Effects:– improves invariance– improves optimization– increases sparsity

Ranzato

Page 89: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

89

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Ranzato

Convol. LCN Pooling

Convolutional layer increases nr. feature maps.Pooling layer decreases spatial resolution.

Page 90: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

90

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Ranzato

Convol.LCN Pooling

Example with only two filters.

Page 91: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

91

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Ranzato

Convol.LCN Pooling

A hidden unit in the first hidden layer is influenced by a small neighborhood (equal to size of filter).

Page 92: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

92

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Ranzato

Convol.LCN Pooling

A hidden unit after the pooling layer is influenced by a larger neighborhood (it depends on filter sizes and strides).

Page 93: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

93

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Fully Conn. Layers

Whole system

1st stage 2nd stage 3rd stage

Input Image

ClassLabels

Ranzato

After a few stages, residual spatial resolution is very small. We have learned a descriptor for the whole image.

Page 94: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

94

CONV NETS: TYPICAL ARCHITECTURE

Convol. LCN Pooling

One stage (zoom)

Ranzato

SIFT → K-Means → Pyramid Pooling → SVM

SIFT → Fisher Vect. → Pooling → SVM

Lazebnik et al. “...Spatial Pyramid Matching...” CVPR 2006

Sanchez et al. “Image classifcation with F.V.: Theory and practice” IJCV 2012

Conceptually similar to:

Page 95: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

95

CONV NETS: TRAINING

Algorithm:Given a small mini-batch- F-PROP- B-PROP- PARAMETER UPDATE

All layers are differentiable (a.e.). We can use standard back-propagation.

Ranzato

Page 96: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

96

KEY IDEAS: CONV. NETS

Conv. Nets have special layers like:– pooling, and– local contrast normalizationBack-propagation can still be applied.

These layers are useful to:– reduce computational burden– increase invariance– ease the optimization

Ranzato

Page 97: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

97

Outline

Motivation

Deep Learning: The Big Picture

From neural nets to convolutional nets

Applications

A practical guide

Ranzato

Page 98: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

98

CONV NETS: EXAMPLES- OCR / House number & Traffic sign classification

Ciresan et al. “MCDNN for image classification” CVPR 2012Wan et al. “Regularization of neural networks using dropconnect” ICML 2013

Page 99: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

99

CONV NETS: EXAMPLES- Texture classification

Sifre et al. “Rotation, scaling and deformation invariant scattering...” CVPR 2013

Page 100: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

100

CONV NETS: EXAMPLES- Pedestrian detection

Sermanet et al. “Pedestrian detection with unsupervised multi-stage..” CVPR 2013

Page 101: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

101

CONV NETS: EXAMPLES- Scene Parsing

Farabet et al. “Learning hierarchical features for scene labeling” PAMI 2013

Page 102: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

102

CONV NETS: EXAMPLES- Segmentation 3D volumetric images

Ciresan et al. “DNN segment neuronal membranes...” NIPS 2012Turaga et al. “Maximin learning of image segmentation” NIPS 2009

Page 103: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

103

CONV NETS: EXAMPLES- Action recognition from videos

Taylor et al. “Convolutional learning of spatio-temporal features” ECCV 2010

Page 104: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

104

CONV NETS: EXAMPLES- Robotics

Sermanet et al. “Mapping and planning ...with long range perception” IROS 2008

Page 105: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

105

CONV NETS: EXAMPLES- Denoising

Burger et al. “Can plain NNs compete with BM3D?” CVPR 2012

original noised denoised

Page 106: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

106

CONV NETS: EXAMPLES- Dimensionality reduction / learning embeddings

Hadsell et al. “Dimensionality reduction by learning an invariant mapping” CVPR 2006

Page 107: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

107

CONV NETS: EXAMPLES- Deployed in commercial systems (Google & Baidu, spring 2013)

Page 108: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

108

CONV NETS: EXAMPLES- Image classification

Krizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

ObjectRecognizer railcar

Page 109: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

109

Architecture

CONV

LOCAL CONTRAST NORM

MAX POOLING

FULLY CONNECTED

LINEAR

CONV

LOCAL CONTRAST NORM

MAX POOLING

CONV

CONV

CONV

MAX POOLING

FULLY CONNECTED

RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

category prediction

input

Page 110: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

110

Architecture

CONV

LOCAL CONTRAST NORM

MAX POOLING

FULLY CONNECTED

LINEAR

CONV

LOCAL CONTRAST NORM

MAX POOLING

CONV

CONV

CONV

MAX POOLING

FULLY CONNECTED

Total nr. params: 60M

4M

16M

37M

442K

1.3M

884K

307K

35K

RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

category prediction

input

Page 111: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

111

Architecture

CONV

LOCAL CONTRAST NORM

MAX POOLING

FULLY CONNECTED

LINEAR

CONV

LOCAL CONTRAST NORM

MAX POOLING

CONV

CONV

CONV

MAX POOLING

FULLY CONNECTED

Total nr. params: 60M

4M

16M

37M

442K

1.3M

884K

307K

35K

Total nr. flops: 832M

4M

16M37M

74M

224M

149M

223M

105M

RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

category prediction

input

Page 112: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

112

Optimization

SGD with momentum:

Learning rate = 0.01

Momentum = 0.9

Improving generalization by:

Weight sharing (convolution)

Input distortions

Dropout = 0.5

Weight decay = 0.0005

Ranzato

Page 113: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

113

Results: ILSVRC 2012

Ranzato

Page 114: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

114

Results: ILSVRC 2012

Ranzato

Page 115: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

115

Results

First layer learned filters (processing raw pixel values).

RanzatoKrizhevsky et al. “ImageNet Classification with deep CNNs” NIPS 2012

Page 116: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

116

Page 117: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

117

TEST IMAGE RETRIEVED IMAGES

Page 118: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

118

Outline

Motivation

Deep Learning: The Big Picture

From neural nets to convolutional nets

Applications

A practical guide

Ranzato

Page 119: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

119

CHOOSING THE ARCHITECTURE

[Convolution → LCN → pooling]* + fully connected layer

Cross-validation

Task dependent

The more data: the more layers and the more kernelsLook at the number of parameters at each layerLook at the number of flops at each layer

Computational cost

Be creative :)Ranzato

Page 120: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

120

HOW TO OPTIMIZE

SGD (with momentum) usually works very well

Pick learning rate by running on a subset of the dataBottou “Stochastic Gradient Tricks” Neural Networks 2012Start with large learning rate and divide by 2 until loss does not divergeDecay learning rate by a factor of ~100 or more by the end of training

Use non-linearity

Initialize parameters so that each feature across layers has similar variance. Avoid units in saturation.

Ranzato

Page 121: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

121

HOW TO IMPROVE GENERALIZATION

Weight sharing (greatly reduce the number of parameters)

Data augmentation (e.g., jittering, noise injection, etc.)

Dropout Hinton et al. “Improving Nns by preventing co-adaptation of feature detectors” arxiv 2012

Weight decay (L2, L1)

Sparsity in the hidden units

Multi-task (unsupervised learning)

Ranzato

Page 122: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

122

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

Ranzato

sam

p les

hidden unitGood training: hidden units are sparse across samples and across features.

Page 123: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

123

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

Ranzato

sam

p les

hidden unitBad training: many hidden units ignore the input and/or exhibit strong correlations.

Page 124: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

124

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

Visualize parameters

Good training: learned filters exhibit structure and are uncorrelated.

GOOD BADBAD BAD

too noisy too correlated lack structure

Ranzato

Page 125: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

125

OTHER THINGS GOOD TO KNOW

Check gradients numerically by finite differences

Visualize features (feature maps need to be uncorrelated) and have high variance.

Visualize parameters

Measure error on both training and validation set.

Test on a small subset of the data and check the error → 0.

Ranzato

Page 126: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

126

WHAT IF IT DOES NOT WORK?

Training diverges:Learning rate may be too large → decrease learning rateBPROP is buggy → numerical gradient checking

Parameters collapse / loss is minimized but accuracy is low Check loss function:

Is it appropriate for the task you want to solve?Does it have degenerate solutions?

Network is underperformingCompute flops and nr. params. → if too small, make net largerVisualize hidden units/params → fix optmization

Network is too slowCompute flops and nr. params. → GPU,distrib. framework, make net smaller

Ranzato

Page 127: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

127

FUTURE CHALLENGES Scalability

HardwareGPU / distributed frameworks

AlgorithmsBetter lossesBetter optimizers

Learning better representationsVideoUnsupervised learningMulti-task learning

Feedback at training and inference time

Structure prediction

Black-box tool (hyper-parameters optimization)RanzatoSnoek et al. “Practical Bayesian optimization of ML algorithms” NIPS 2012

Page 128: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

128

SUMMARY

Ranzato

Want to efficiently learn non-linear adaptive hierarchical systems

End-to-end learning

Gradient-based learning

Adapting neural nets to vision:Weight sharingPooling and Contrast Normalization

Improving generalization on small datasets:Weight decay, dropout, sparsity, multi-task

Training a convnet means:Design architectureDesign loss functionOptimization (SGD)

Very successful (large-scale) applications

Page 129: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

129

SOFTWARETorch7: learning library that supports neural net traininghttp://www.torch.chhttp://code.cogbits.com/wiki/doku.php (tutorial with demos by C. Farabet)

Python-based learning library (U. Montreal)

- http://deeplearning.net/software/theano/ (does automatic differentiation)

C++ code for ConvNets (Sermanet)

– http://eblearn.sourceforge.net/

Efficient CUDA kernels for ConvNets (Krizhevsky)

– code.google.com/p/cuda-convnet

Ranzato

Page 130: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

130

REFERENCESConvolutional Nets– LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998

- Krizhevsky, Sutskever, Hinton “ImageNet Classification with deep convolutional neural networks” NIPS 2012

– Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage Architecture for Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09), IEEE, 2009

- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional Feature Hierachies for Visual Recognition, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010

– see yann.lecun.com/exdb/publis for references on many different kinds of convnets.

– see http://www.cmap.polytechnique.fr/scattering/ for scattering networks (similar to convnets but with less learning and stronger mathematical foundations)

Ranzato

Page 131: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

131

REFERENCESApplications of Convolutional Nets

– Farabet, Couprie, Najman, LeCun. Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, ICML 2012

– Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala and Yann LeCun: Pedestrian Detection with Unsupervised Multi-Stage Feature Learning, CVPR 2013

- D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. NIPS 2012

- Raia Hadsell, Pierre Sermanet, Marco Scoffier, Ayse Erkan, Koray Kavackuoglu, Urs Muller and Yann LeCun. Learning Long-Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, 2009

– Burger, Schuler, Harmeling. Image Denoisng: Can Plain Neural Networks Compete with BM3D?, CVPR 2012

– Hadsell, Chopra, LeCun. Dimensionality reduction by learning an invariant mapping, CVPR 2006

– Bergstra et al. Making a science of model search: hyperparameter optimization in hundred of dimensions for vision architectures, ICML 2013 Ranzato

Page 132: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

132

REFERENCESDeep Learning in general

– deep learning tutorial slides at ICML 2013

– Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009.

– LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006

Ranzato

Page 133: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

133

ACKNOWLEDGEMENTS

Ranzato

Yann LeCun - NYU

Alex Krizhevsky - Google

Jeff Dean - Google

Page 134: Large-Scale Visual Recognition With Deep …ranzato/publications/ranzato_cvpr13.pdfLarge-Scale Visual Recognition With Deep Learning Sunday 23 June 2013 Marc'Aurelio Ranzato ranzato@google.com

134

THANK YOU!

Ranzato