comparing intuitive physics engine and convnetson physical...

13
Comparing Intuitive Physics Engine and ConvNets on Physical Scene Understanding Renqiao Zhang* Jiajun Wu* Chengkai Zhang Bill Freeman Josh Tenenbaum (* equal contributions) CogSci 2016

Upload: others

Post on 26-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Comparing Intuitive Physics Engine and ConvNets on Physical Scene Understanding

Renqiao Zhang* Jiajun Wu* Chengkai Zhang Bill Freeman Josh Tenenbaum

(* equal contributions)CogSci 2016

Page 2: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Understanding Physical Events

Video from Facebook AI blog post

Page 3: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Motivation

Observations:• Humans can understand physical events very quickly• They can also transfer the knowledge to other tasks

Goal: a proper computational model for this processModels:• a intuitive physics engine (IPE) • a convolutional neural net (CNN)

Page 4: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Stimuli

Our Stimuli

The classic tower block scenario

• Battaglia, Hamrick, Tenenbaum. PNAS, 2013.• Lerer, Gross, Fergus. ICML, 2016.

Page 5: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

IPE: Inverse Graphics

Graphics engine

SamplingLikelihood function

Inverted representation Rendered image Input

image

Battaglia, Hamrick, Tenenbaum. PNAS, 2013

Page 6: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

IPE: Simulation

Battaglia, Hamrick, Tenenbaum. PNAS, 2013

Page 7: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Convolutional Networks

• LeNet: smaller, 2 convs and 2 linear layers• AlexNet: widely used, 5 convs and 3 linear layers• AlexNet pretrained on ImageNet

Page 8: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Observation: Asymmetry

Observations:• Networks have better overall performance.• There is an asymmetry in human accuracies.• The IPE can model the asymmetry, but not the CNNs.

Accuracies (%) of humans, IPE, LeNet, and AlexNet

Page 9: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Size of Training Set

Observations:• Networks achieves human-like accuracies with 1K images.• AlexNet performs better, but only with enough data.• Pretrained networks suffer less from the lack of data.

Accuracies (%) of networks with different sizes of training data

Page 10: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Accuracy with Limited Training Data

Still, only the IPE can model the asymmetry, not the CNNs.

Accuracies (%) of humans, IPE, LeNet, and AlexNet

Page 11: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Correlation with Human Responses

Both models correlate with human responses reasonably well.

Correlations between human and model responses (𝑝 for Pearson’s coefficients)

Page 12: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Visual Instability

Images

Results

Stable images may have different visual instability.How would models perform in these cases?

Observation: Both human and the IPE perform worse as visual instability increases, but not CNNs.

Page 13: Comparing Intuitive Physics Engine and ConvNetson Physical …blocks.csail.mit.edu/talks/blocks_slides_cogsci.pdf · Video from Facebook AI blog post. Motivation Observations: •

Generalization

Observation: Both human and the IPE perform worse as the number of blocks increases, but not CNNs.

Data Results

Can models generalize to images with a different number of blocks?