visual deep learning models,cs.wellesley.edu/~vision/slides/tommy_class.pdf · • human brain...
TRANSCRIPT
Center Name Presenter Name
Visual deep learning models, in particular for face recognition
and models of invariant recognition
in the ventral stream
Towards a theory of the above
tomaso poggio, CBMM, BCS, CSAIL, McGovern MIT
Plan
• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory
Second Annual NSF Site Visit, June 2 – 3, 2015
Theoretical/conceptual framework for vision
• The first 100ms of vision: feedforward and invariant: what, who, where
• Top-down needed for verification step and more complex questions: generative models, probabilistic inference, top-down visual routines.
Following this conceptual framework we are working on:
1.a theory of invariance cortical computation —> i-theory2.a generative approach, probabilistic in nature 3.visual routines, and of how they may be learned.
Object recogni-on
• Human Brain –1010-1011 neurons (~1 million flies) –1014- 1015 synapses
Vision: what is where
• Ventral stream in rhesus monkey –~109 neurons in the ventral stream
(350 106 in each emisphere) –~15 106 neurons in AIT (Anterior
InferoTemporal) cortex
• ~200M in V1, ~200M in V2, 50M in V4
Van Essen & Anderson, 1990
Source: Lennie, Maunsell, Movshon
Vision: what is where
[software available online]Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007
• It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959: qual. Fukushima, 1980: quant; Oram & Perrett, 1993: qual; Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998: not-bio; Amit & Mascaro, 2003: not-bio; Hinton, LeCun, Bengio not-bio; Deco & Rolls 2006…)
• As a biological model of object recognition in the ventral stream – from V1 to PFC -- it is perhaps the most quantitatively faithful to known neuroscience data
Recogni-on in Visual Cortex: ‘’classical model”, selec-ve and invariant
Feedforward Models: “predict” rapid categorization (82% model vs. 80% humans)
Hierarchical feedforward models of the ventral stream
Why do these networks including DLCNs
work so well?
Models are not enough… we need a theory!
Plan
• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Invariance via pooling
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
New name for virtual examples
45
46
A poor man regularization!
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Mobileye
Plan
• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory
Plan
• Recognition in visual cortex • DCLNs • Deep Face systems • iTheory
71
i-theoryLearning of invariant&selective Representations in Sensory Cortex
i-theory: exploring a new hypothesis
A main computational goal of the feedforward ventral stream hierarchy — and of vision — is to compute a representation for each incoming image which is invariant to transformations previously experienced in the visual environment.
73
Empirical demonstraCon: invariant representaCon leads to lower sample complexity for a supervised classifier
Theorem (transla)on case) Consider a space of images of dimensions pixels which may appear in any posiCon within a window of size pixels. The usual image representaCon yields a sample complexity ( of a linear c l a s s i fi e r ) o f order ;the oracle representaCon (invariant) yields (because of much smaller covering numbers) a sample complexity of order
d × d
rd × rd
m = O(r2d 2 )
moracle = O(d2 ) =
mimage
r2
74
An algorithm that learns in an unsupervised way to compute invariant representations
ν
P(ν )
νµkn(I) = 1/|G|
|G|X
i=1
�(I · gitk + n�)
75
Invariant signature from a single image of a new object
We need only a finite number of projections, K, to distinguish among n images.
Similar in spirit to Johnson-Lindestrauss
Local and global invariance: whole-parts theorem
l=4
l=3
l=2
l=1HW module
biophysics: prediction
...
Basic machine: a HW module (dot products and histograms/moments for image seen through RF)
• The cumulative histogram (empirical cdf) can be be computed as
• This maps directly into a set of simple cells with threshold
• …and a complex cell indexed by n and k summating the simple cells
µnk (I ) = 1
|G |σ ( I ,git
k + nΔ)i=1
|G |
∑
nΔ
The nonlinearity can be rather arbitrary for invariance provided it is stationary in time
Second Annual NSF Site Visit, June 2 – 3, 2015
Dendrites of a complex cells as simple cells…
Active properties in the dendrites of the complex cell
Plan
• i-theory • DCLNs • equivalence to DCLNs, theory notes • Some predictions of i-theory • Deep Face systems