going after object recognition peformance to discover how...

James DiCarlo MD, PhD

Professor of NeuroscienceHead, Department of Brain and Cognitive Sciences Investigator, The McGovern Institute for Brain ResearchMassachusetts Institute of Technology, Cambridge MA, USA

Going after object recognition peformance to discover how the ventral stream works.

“invariance” is crux problem

hierarchical, working system

Ventral visual stream

Systems neuroscience: the non human primate model

Powerful set of visual features

Understanding the brain and discovering game-changing information processing

technology are two sides of the same coin.

How the brain works

When biological brains perform better than computers

computer science

neuroscience

psychophysics

The convergence of three fields

How the brain works

When computers perform as well as or better than biological brains

Falsifiable hypotheses

Attempt to test/falsify those hypotheses

New ideas, algorithm parametersNew phenomena

Common physical source (object) leads to many images

Poggio, Ullman, Grossberg, Edleman, Biederman, etc.DiCarlo and Cox, TICS (2007); Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)

“identity preserving image variation”

View: position, size, pose, illumination Clutter, occlusion, illumination

Intraclass

Deformation, articulation

computer science

neuroscience

How the brain works

psychophysics

• Examples:• Hubel & Wiesel (1962)• Fukushima (1980)• Perrett & Oram (1993)• Wallis & Rolls (1997)• LeCun et al. (1998)• Risenhuber & Poggio (1999)• Serre, Kouh, et al. (2005)

Brain-inspired computer algorithms

1. Selectivity 2. Tolerance

“AND” “OR”

Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005

•Hierarchy•Spatially local filters•Convolution•Normalization•Threshold NL•Unsupervised learning•...

FROM BIOLOGY:

computer science

neuroscience

psychophysics

How the brain works

e.g. HMAX

Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005

HMAX successes (~2005)

Serre Oliva & Poggio 2007

(under limited human viewing conditions)

HMAX successes (~2007)

pixels

Human levelIT population HMAX

Circa 2007

~2008: But HMAX and other models failed to explain neurons

HMAX model

Representational similarity analysis

Kriegeskorte, Frontiers in Neuroscience (2009)

Biological ventral stream Models of ventral stream

computer science

neuroscience

psychophysics

What went wrong?

How the brain works

Stringency of these “Brains vs. Machines” tests was far too weak

“V1-like” models

One problem was insufficient variation in the test sets.

~2008: Tests of performance were not stringent enough.

Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)

SLF (~HMAX)

Caltech 101 benchmark

Far-body

“HMAX 2.0” (Serre et al. PNAS 2007)

Pinto, Majaj, Barhomi, Salomon, Cox, DiCarlo COSYNE 2010

Animal vs. Non-animal

Humans

V1-like

pixels

Human levelIT population HMAXV1-like

Example object recognition task: “car detection”

Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

Image generation strategy:

2009: More stringent, but compact tests of “object recognition”

Example object recognition task: “car detection”

Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

no variation more variation lots of variation

Image generation strategy:

- Parametric control of task demand (esp. invariance)- Few images needed to bring computer vision features to their knees

no variation more variation lots of variation

“car” not “car”

...... n>100 n>700

Basic car task, variation level: 3

2009: Toward more stringent tests of “object recognition”

Data merged here: 48 basic-level tasks (8 labels x 6 level of variation)

Machines lose to humans

2010: Machines vs. human brains

Machines beat humans!

0%0%0%0°0°

10%20%10%15°15°

20%40%20%30°30°

30%60%30%45°45°

40%80%40%60°60°

50%100%50%75°75°

60%120%60%90°90°

position (x-axis)position (y-axis)

scalein-plane rotationin-depth rotation

Increasing Composite Variation

4 60 1 2 350

Pixels

V1-like

chance

V1-like

a) “cars vs. planes” task b) controls

new draw0

more training

other objects

multi-class

0Perfo

Pinto, Barhomi, Cox & DiCarlo, WACV(2010)

PHOWPHOG

(~HMAX)

pixels

Human levelIT population HMAXV1-like

pixels

Human level

IT population HMAXV1-like

pixels

Human level

IT populationHMAX

V1-like

pixels

Human levelIT population

HMAXV1-like

simple decode

pixels

HMAXV1-like

V4 population

simple decode

pixels

V1-like

V4 population

SuperVisionHMO

? Zeiler&Fergus

simple decode

IT neuronal unitsV2-like V4 neuronal units HMO modelV1-like PixelsAnimals (8)

Boats (8)

Cars (8)

Chairs (8)

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

Imagegeneralization

Objectgeneralization

Categorygeneralization

Animals (4)Boats (4)Cars (4)Chairs (4)Faces (4)Fruits (4)Planes (4)Tables (4)

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

Boats (8)

Cars (8)

Chairs (8)

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

Imagegeneralization

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

Neural population similarity of images along the ventral stream

Boats (8)

Cars (8)

Chairs (8)

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

Imagegeneralization

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

other models

IT neuronal unitsV2-like model V4 neuronal units HMO modelV1-like modelAnimals (8)Boats (8)Cars (8)Chairs (8)Faces (8)Fruits (8)Planes (8)Tables (8)

Imagegeneralization

Faces (8)

Fruits (8)

Planes (8)

Tables (8)

HMAX Model

Explanatory power of HMO model

Current maximum expected explanatory power *

Yamins, Hong, Soloman, Seibert and DiCarlo (under review)

Inspired by N. Kriegeskorte et al. (2008, 2009)

Animals Boats Cars Chairs Faces Fruits Planes Tables

Unit 1: r2 = 0.48

00.2 0.4 0.6 0.8 1.0

n = 147b

0.0Goo

Imagegeneralization

cAnimals Boats Cars Chairs Faces Fruits Planes Tables

Unit 2: r2 = 0.55

Animals Boats Cars Chairs Faces Fruits Planes Tables

Unit 3: r2 = 0.34

Goodness of fit to individual IT unit’s response (r2)

Yamins, Hong, Soloman, Seibert and DiCarlo (under review)

Ability to predict IT responses to new images and new objects is dramatically better than previous models.

Predictions of single site IT responses from current best model

Response of neural site

Prediction of HMO model

Response of neural site

Prediction of HMO model

��

NormalizePoolFilter Threshold &

Saturate

Neural-like basic operations

a Basic operations:

O��

O��e��e��e��e�� e�� filter , thr , sat , pool , norm

O �� O ��

Hierarchical Stacking

Basic bio-constrained model component inside HMO

Hubel & Wiesel (1962), Fukushima (1980); Perrett & Oram (1993); Wallis & Rolls (1997); LeCun et al. (1998); Riesenhuber & Poggio (1999); Serre, Kouh, et al. (2005), etc....

Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

“Output” is thousands of visual features

Performance of artificial visual features(% correct)

Exploratio

basic m

odel class

We are optimizing this way

The better a model performs, the better is explains IT responses.

(2013)

pixels

V1-like

V4 population

SuperVision

Zeiler&Fergus

??simple decode

Today:

computer science

neuroscience

psychophysics

Follow the performance trail...

How the brain works

Stringency of these tests is crucial.

Must include “invariance”.

The power of stringent tests to elucidate biological brains

• Discover IT neuronal codes that can explain behavior• Demonstrate that other possible codes CANNOT• Demonstrate which computer vision features CANNOT

• Driving discovery (“learning?”) of new CV features• These are becoming more and more capable of

explaining what the brain is doing

Dan Yamins Ha Hong Charles Cadieu Dave Cox Nicolas Pinto

Dan Yamins Ha Hong Ethan Soloman

going after object recognition peformance to discover how...

Documents

session 45 direct loans: how to get started and get going !...

lecture%9% the%industrial%revolubon% - uchicago...

prayer requests / peticiones de oraciÓn · veronica ross,...

stat1174c clean-up notice issued to dicarlo, antonino

nicole dicarlo design portfolio

information theory and feature selection - (joint...

april 2021 dicarlo

northsyracuseny.orgnorthsyracuseny.org/uploads-plugins/uploads/2011/...director...

some recent advances in multiscale geometric analysis of...

niall p. mccarthy (#160175) - department of...

using neuronal latency to determine...

regularization methods for high dimensional...

flight standardization board report -...

statistically evaluating water consumption …€¦ ·...

recent advances in hilbert space representation of...

photography by candace dicarlo sept | oct 2013

pompeii the uses of graffiti. group members mariarosa...

solving structured sparsity regularization with proximal...

multilingualism, solidarity, and magic. new perspectives...

james%j. abbas ph.d. - arizona state...