introduction methods - columbia university · camel leg rhino wrench knife gun calculator table o1:...

Feed-forward deep neural networks diverge from primates on core visual object recognition behaviorRishi Rajalingham*, Elias B. Issa1*, Kailyn Schmidt, Kohitij Kar, Pouya Bashivan, and James J. DiCarloDept. of Brain and Cognitive Sciences, McGovern Institute for Brain Research, MITDepartment of Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University1

Inception-v3Human Pool Monkey Pool (n=5)

V1 ALEXNET

GOOGLENET

RESNET

INCEPTION-v3

Primate Zone

Human Pool

Monkey Pool (n=5)

Inception-v3

V1 ALEXNET

GOOGLENET

RESNET

INCEPTION-v3

Primate ZoneMonkey Pool (n=5 subjects)

Heldout Human Pool (n=4 subjects)

lity 1

Human Monkey HCNN

Human Pool

Monkey Pool (n=5)

Inception-v3

O2: Performance per object for each distracter (confusion matrix)

V1 ALEXNET

GOOGLENET

RESNET

INCEPTION-v3

c) Primate Zone

Monkey Pool (n=5 subjects)

Heldout Human Pool (n=4 subjects)

Δd’

Human performance (Δd’)

n=240 images

Human performance (Δd’)

I1c: Performance per image across distracters

Humans can rapidly and accurately recognize objects in spite of variation in viewing param-eters (e.g. position, pose, and size) and background conditions.

To understand this invariant object recognition ability, one can construct and test models that aim to accurately capture human behavior, including making the same patterns of errors over all images. Here, we applied this straightforward visual “Turing test” to the leading feed-forward com-putational models of human vision (hierarchical convolutional neural networks, HCNNs) and to a leading animal model (rhesus macaques).

Our results reveal a failure of current HCNNs to fully replicate the image-level behavioral patterns of primates.

Introduction Methods

Behavioral paradigm: We tested object recognition performance on a two alternative forced choice (2AFC) match-to-sam-ple task using brief (100ms) presentations of natural-istic synthetic images of basic-level objects.

AndroidTablet

Juice reward tubeArduino controller

High-throughput psychophsyics: Over a million behavioral trials across hun-dreds of humans and �ve monkeys were ag-gregated to characterize each species on each image. To collect such large behavioral data-sets, we used Amazon Mechanical Turk for humans and a novel home-cage touchscreen system for monkeys.

Stimuli: To enforce invariant object recognition behavior, we generated several thousand naturalistic synthetic images (2400 images, 24 objects, 100 images/object), each with one foreground object (by ren-dering a 3D model of each object with random viewing parameters) on a random natural image background.

Methods

HARD EASY

Fixation(100ms)

Test Image (100ms)

Choice Screen(1000ms)

Amazon Mechanical Turk “MonkeyTurk”

TankBirdShortsElephantHouseHangerPenHammer

ZebraForkGuitarBearWatchDogSpiderTruck

Wrench Rhino LegCamel TableCalculatorGunKnife

O1: Performance per object across distracters

Results

Inception-v3Human Pool Monkey Pool (n=5)

V1 ALEXNET

GOOGLENET

RESNET

INCEPTION-v3

Primate Zone

I2c: Performance per image for each distracter (confusion matrix)

Conclusions

Behavioral metrics: We characterized the core object recognition behavior of humans, macaques and HCNNs over a large set of images and distracter objects using several behavioral metrics. Each behavioral metric computes a pattern of unbiased behavioral perfor-mances, using a sensitivity index (d’), at the resolution of objects (O1, O2) and images (I1, I2). To isolate image-driven behavioral variance that is object independent, we normalized image-level behavioral metrics by subtracting the mean performance values over all images of the same object (I1c, I2c).

Each object can produce myriad images under variations in viewing parame-ters, with some images being more challenging than others.

Consistent with previous work, macaque monkeys and all tested HCNN models are highly consistent with humans with respect to object-level metrics (O1, O2), while a lower level representation (V1) is less consistent.

Eccentricity Size Pose Di�erence

Monkey PoolHuman Pool

ce (Δ

Mean Luminance BG Spatial Frequency Segmentation Index

Eccentri

Pose D

i�erence

Mean Luminance

Segmentation In

BG Spatial F

requency

All tested HCNN models fail to accurately capture the image-level behavioral patterns of primates.

In contrast to the corresponding object-level metric (O1), all tested HCNNs diverge from primates with respect to the one-versus-all image-level behavioral pattern (I1c).

Zooming in on one example image of a pen, humans and monkeys make similar confusions (choices in object labels), while the HCNN model dif-fers signi�cantly.

Example image with corresponding behavior-al patterns

(Left, top row) For each behavioral metric, we de�ned a “primate zone” as the range of consistency values delimited by extrapolated estimates of con-sistency to the human pool of large (i.e. in�nitely many subjects) pool of rhesus macaque monkeys and humans respectively.

Thus, the primate zone de�nes a range of consistency values that corre-spond to models that accurately capture the behavior of the human pool, at least as well as a monkey pool.

(Left) Consistency at the image level could not be easily rescued by pooling together several “model subjects,” (top panels) or simple modi�cations (bottom panels), such as primate-like retinal input sampling or additional model training on synthetic images generated in the same manner as our test set.

(Right) Additionally, the gap in consistency could not be attributed to simple image attributes (e.g. viewpoint meta-parameters, low-level image characteristics.

# Subjects in Pool

Primate Zone

Monkey Pool

HCNN Pool

# Subjects in Pool

Object-level metric (O2) Image-level metric (I2c)

Fine-tunedHCNN

Retina-sampledHCNN

Early HCNNlayers

Inception-v3

Model Performance (d’)

Fine-tunedHCNN

Retina-sampledHCNN

Early HCNNlayers

Inception-v3

Primate Zone

The di�erence in image-level behav-ioral pattern between HCNNs and pri-mates is not strongly dependent on image attribtues.

• In recent years, speci�c HCNN models, optimized to match human-level object recognition performance, have proven to be a dramatic advance in the state-of-the-art of arti�cial vision systems.

• HCNN models match primates in their core object recognition behavior with respect to ob-ject-level di�culty and object-level confusion patterns.

• HCNN models fail to match primate behavior in image-level di�culty and fail to match im-age-level confusion patterns.

• Attempts to rescue the image-level gap in consistency, including adding retinal sampling on the model front-end, adding various decoders on the model back-end, and supplementing model training with similar images, did not reduce the gap between models and primates.

• Furthermore, no particular property of the image (luminance, spatial frequencey) or of the object in the image (size, pose) was strongly correlated with model failure to capture primate image-level performance.

Taken together, these results suggest that high-resolution behaviour could serve as a strong constraint for discovering models that more precisely capture the neural mechanisms under-lying primate object recognition.

To measure the behavioral consistency between two systems with respect to a given behavioral metric, we computed a noise-adjusted correlation, which accounts for the reliability (i.e. internal con-sistency) of behavioral measurements.

10 images/object

In contrast to the corresponding object-level metric (O2), all tested HCNNs diverge from primates with respect to the one-versus-other image-lev-el behavioral pattern (I2c).

ITAR PE

IDER LEG

introduction methods - columbia university · camel leg rhino wrench knife gun calculator table o1:...

Documents

the di culty of easy projects - miami

long-term and end-of-life care in california, 2020:...

in limine primo: th e diffi culty of reality in paul

fa culty of health sciences &technology

engaging academics with a simplified analysis of their...

fa culty of engineeri nggdcooma/presentations/ipmu2010...fa...

older adults’ health and age-related changes reality...

com culty soc

fever coughing di culty breathing chest pain shortness of...

understanding the difï¬culty of training deep feedforward...

study guide bricklayer - newfoundland and labrador guide...

automated c-test di culty prediction: integrating lexical...

type culty level of ques tions -...

to ether *** average di˜ culty revision extra challenge

demonstrability, di culty and persuasion: an experimental

unstable atrial tachycardia case writeup · unstable atrial...

contents lists available at sciencedirect journal of...

maximum marks total duration maximum time for answering...

chapter 7 measuring dif culty in translation and post

rapid sensory gain with emotional distracters precedes...