going after object recognition peformance to discover how...
Post on 24-Apr-2020
4 Views
Preview:
TRANSCRIPT
James DiCarlo MD, PhD
Professor of NeuroscienceHead, Department of Brain and Cognitive Sciences Investigator, The McGovern Institute for Brain ResearchMassachusetts Institute of Technology, Cambridge MA, USA
Going after object recognition peformance to discover how the ventral stream works.
“invariance” is crux problem
hierarchical, working system
Ventral visual stream
Systems neuroscience: the non human primate model
Powerful set of visual features
Ventral visual stream
Systems neuroscience: the non human primate model
Powerful set of visual features
Understanding the brain and discovering game-changing information processing
technology are two sides of the same coin.
How the brain works
When biological brains perform better than computers
computer science
neuroscience
psychophysics
The convergence of three fields
How the brain works
When computers perform as well as or better than biological brains
Falsifiable hypotheses
Attempt to test/falsify those hypotheses
New ideas, algorithm parametersNew phenomena
Common physical source (object) leads to many images
Poggio, Ullman, Grossberg, Edleman, Biederman, etc.DiCarlo and Cox, TICS (2007); Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)
“identity preserving image variation”
View: position, size, pose, illumination Clutter, occlusion, illumination
Intraclass
Deformation, articulation
computer science
neuroscience
The convergence of three fields
How the brain works
New ideas, algorithm parametersNew phenomena
psychophysics
• Examples:• Hubel & Wiesel (1962)• Fukushima (1980)• Perrett & Oram (1993)• Wallis & Rolls (1997)• LeCun et al. (1998)• Risenhuber & Poggio (1999)• Serre, Kouh, et al. (2005)
Brain-inspired computer algorithms
1. Selectivity 2. Tolerance
“AND” “OR”
Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005
•Hierarchy•Spatially local filters•Convolution•Normalization•Threshold NL•Unsupervised learning•...
FROM BIOLOGY:
computer science
neuroscience
psychophysics
The convergence of three fields
How the brain works
Falsifiable hypotheses
Attempt to test/falsify those hypotheses
e.g. HMAX
~2008: But HMAX and other models failed to explain neurons
HMAX model
Representational similarity analysis
Kriegeskorte, Frontiers in Neuroscience (2009)
Biological ventral stream Models of ventral stream
computer science
neuroscience
psychophysics
What went wrong?
How the brain works
Falsifiable hypotheses
Attempt to test/falsify those hypotheses
New ideas, algorithm parametersNew phenomena
Stringency of these “Brains vs. Machines” tests was far too weak
“V1-like” models
One problem was insufficient variation in the test sets.
~2008: Tests of performance were not stringent enough.
Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)
SLF (~HMAX)
Caltech 101 benchmark
Head
Clo
se-b
ody
Mediu
m-b
ody
Far-body
50
75
100
Perfo
rm
an
ce (
%)
“HMAX 2.0” (Serre et al. PNAS 2007)
Pinto, Majaj, Barhomi, Salomon, Cox, DiCarlo COSYNE 2010
Animal vs. Non-animal
Humans
V1-like
Example object recognition task: “car detection”
Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)
Image generation strategy:
2009: More stringent, but compact tests of “object recognition”
Example object recognition task: “car detection”
Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)
no variation more variation lots of variation
Image generation strategy:
- Parametric control of task demand (esp. invariance)- Few images needed to bring computer vision features to their knees
no variation more variation lots of variation
“car” not “car”
...... n>100 n>700
Basic car task, variation level: 3
2009: Toward more stringent tests of “object recognition”
Δ
Data merged here: 48 basic-level tasks (8 labels x 6 level of variation)
Machines lose to humans
2010: Machines vs. human brains
Machines beat humans!
0%0%0%0°0°
10%20%10%15°15°
20%40%20%30°30°
30%60%30%45°45°
40%80%40%60°60°
50%100%50%75°75°
60%120%60%90°90°
position (x-axis)position (y-axis)
scalein-plane rotationin-depth rotation
Increasing Composite Variation
Perfo
rman
ce (%
)
4 60 1 2 350
60
70
80
90
100
Pixels
V1-like
chance
SIFT
SLF
V1-like
a) “cars vs. planes” task b) controls
new draw0
25
more training
30
0
other objects
30
0
multi-class
35
0
25
0Perfo
rman
ce re
lativ
e to
Pix
els
(%)
Geo
met
ric B
lur
PHO
G
PHO
W
Pinto, Barhomi, Cox & DiCarlo, WACV(2010)
SLF
PHOWPHOG
SIFT
(~HMAX)
pixels
Human levelIT population
HMAX
V1-like
Perf
orm
ance
V4 population
SuperVisionHMO
? Zeiler&Fergus
simple decode
IT neuronal unitsV2-like V4 neuronal units HMO modelV1-like PixelsAnimals (8)
Boats (8)
Cars (8)
Chairs (8)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
Imag
e g
ener
aliz
atio
nO
bjec
tge
nera
lizat
ion
Cat
egor
y ge
nera
lizat
ion
Imagegeneralization
Objectgeneralization
Categorygeneralization
Animals (4)Boats (4)Cars (4)Chairs (4)Faces (4)Fruits (4)Planes (4)Tables (4)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
0.9
0.6
0.3
0.0
Spe
arm
an c
orre
latio
n co
effic
ient
b c
a
Pix
els
V1-
like
SIF
T
HM
AX
V2-
like
HM
OV
4 un
itsIT
uni
ts s
plit-
half
IT neuronal unitsV2-like V4 neuronal units HMO modelV1-like PixelsAnimals (8)
Boats (8)
Cars (8)
Chairs (8)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
Imag
e g
ener
aliz
atio
nO
bjec
tge
nera
lizat
ion
Cat
egor
y ge
nera
lizat
ion
Imagegeneralization
Objectgeneralization
Categorygeneralization
Animals (4)Boats (4)Cars (4)Chairs (4)Faces (4)Fruits (4)Planes (4)Tables (4)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
0.9
0.6
0.3
0.0
Spe
arm
an c
orre
latio
n co
effic
ient
b c
a
Pix
els
V1-
like
SIF
T
HM
AX
V2-
like
HM
OV
4 un
itsIT
uni
ts s
plit-
half
Neural population similarity of images along the ventral stream
IT neuronal unitsV2-like V4 neuronal units HMO modelV1-like PixelsAnimals (8)
Boats (8)
Cars (8)
Chairs (8)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
Imag
e g
ener
aliz
atio
nO
bjec
tge
nera
lizat
ion
Cat
egor
y ge
nera
lizat
ion
Imagegeneralization
Objectgeneralization
Categorygeneralization
Animals (4)Boats (4)Cars (4)Chairs (4)Faces (4)Fruits (4)Planes (4)Tables (4)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
0.9
0.6
0.3
0.0
Spe
arm
an c
orre
latio
n co
effic
ient
b c
a
Pix
els
V1-
like
SIF
T
HM
AX
V2-
like
HM
OV
4 un
itsIT
uni
ts s
plit-
half
other models
IT neuronal unitsV2-like model V4 neuronal units HMO modelV1-like modelAnimals (8)Boats (8)Cars (8)Chairs (8)Faces (8)Fruits (8)Planes (8)Tables (8)
Imag
e g
ener
aliz
atio
nO
bjec
tge
nera
lizat
ion
Cat
egor
y ge
nera
lizat
ion
Imagegeneralization
Objectgeneralization
Categorygeneralization
Animals (4)Boats (4)Cars (4)Chairs (4)Faces (4)Fruits (4)Planes (4)Tables (4)
Faces (8)
Fruits (8)
Planes (8)
Tables (8)
0.9
0.6
0.3
0.0
Pop
ulul
atio
n si
mila
ritty
to IT
b c
a
Pix
els
V1-li
keS
IFT
HM
AX
V2-
like
HM
OV
4 un
itsIT
uni
ts s
plit-
half
HMAX Model
(RD
M c
orre
latio
n)
Explanatory power of HMO model
Current maximum expected explanatory power *
Yamins, Hong, Soloman, Seibert and DiCarlo (under review)
Inspired by N. Kriegeskorte et al. (2008, 2009)
a 0.8
0.6
0.4
0.2
0.0
Goo
dnes
s of
fit t
o IT
resp
onse
(r2 )
Animals Boats Cars Chairs Faces Fruits Planes Tables
Unit 1: r2 = 0.48
Pix
els
V1-li
ke
SIF
T
HM
AX
V2-
like
HM
O
HM
O (M
1 IT
onl
y)
HM
O (M
2 IT
onl
y)
IT s
plit-
half
25
20
15
10
5
00.2 0.4 0.6 0.8 1.0
Num
ber o
f uni
ts
n = 147b
0.5
0.4
0.3
0.2
0.1
0.0Goo
dnes
s of
fit t
o IT
resp
onse
(r2 )
Pix
els
V1-li
ke
SIF
T
HM
AX
V2-
like
HM
O
Imagegeneralization
Objectgeneralization
Categorygeneralization
cAnimals Boats Cars Chairs Faces Fruits Planes Tables
Unit 2: r2 = 0.55
Animals Boats Cars Chairs Faces Fruits Planes Tables
Unit 3: r2 = 0.34
d
Goodness of fit to individual IT unit’s response (r2)
Yamins, Hong, Soloman, Seibert and DiCarlo (under review)
Ability to predict IT responses to new images and new objects is dramatically better than previous models.
Predictions of single site IT responses from current best model
Response of neural site
Prediction of HMO model
Response of neural site
Prediction of HMO model
...
ĭ1
ĭ2
ĭk
��
�
NormalizePoolFilter Threshold &
Saturate
Neural-like basic operations
L2 L3
a Basic operations:
L1
O��
O�����e������e�����e�����e����� e���� filter , thr , sat , pool , norm
O �� O ��
Hierarchical Stacking
Basic bio-constrained model component inside HMO
Hubel & Wiesel (1962), Fukushima (1980); Perrett & Oram (1993); Wallis & Rolls (1997); LeCun et al. (1998); Riesenhuber & Poggio (1999); Serre, Kouh, et al. (2005), etc....
Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)
“Output” is thousands of visual features
Exp
aine
d V
aria
nce
of IT
Neu
rons
0%
50%
Performance of artificial visual features(% correct)
Abili
ty o
f art
ifici
al v
isua
l fea
ture
s to
pre
dict
IT re
spon
ses
(% v
aria
nce
expl
aine
d)
Exploratio
n of
basic m
odel class
We are optimizing this way
The better a model performs, the better is explains IT responses.
(2013)
pixels
Human levelIT population
HMAX
V1-like
Perf
orm
ance
V4 population
SuperVision
Zeiler&Fergus
HMO
??simple decode
Today:
computer science
neuroscience
psychophysics
Follow the performance trail...
How the brain works
Falsifiable hypotheses
Attempt to test/falsify those hypotheses
New ideas, algorithm parametersNew phenomena
Stringency of these tests is crucial.
Must include “invariance”.
The power of stringent tests to elucidate biological brains
• Discover IT neuronal codes that can explain behavior• Demonstrate that other possible codes CANNOT• Demonstrate which computer vision features CANNOT
1)
• Driving discovery (“learning?”) of new CV features• These are becoming more and more capable of
explaining what the brain is doing
2)
Dan Yamins Ha Hong Charles Cadieu Dave Cox Nicolas Pinto
Dan Yamins Ha Hong Ethan Soloman
top related