open-world visionvision.stanford.edu/teaching/cs131_fall1617/lectures/lecture_amir.pdf · amir...

Post on 31-Dec-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open-World Vision Amir R. Zamir

What, Why, and How of “Representation”

Things... Our Knowledge...

3

“Transcript”

Cat

Macbeth was guilty.

4

“Transcript”

Cat

Macbeth was guilty.

[ 81 20 84 64 58 39 17 54 72 15]

Representation Mathematical Model (e.g., classifier)

5

~12 lbs

~8 lbs

-5 0 +20 7 15 11

X X X X X X X X X X X X X X X X X X X X X X X X

w

6

~12 lbs

~8 lbs

-5 0 +20 7 15 11

X X X X X X X X X X X X X X X X X X X X X X X X

w

7

~12 lbs

~8 lbs

-5 0 +20 7 15 11

X X X X X X X X X X X X X X X X X X X X X X X X

w

Weight (w)

Representation Mathematical Model (Classifier)

w>11

X X

Type B

Type A

8

Represent these cats for a cat detector!

9

Represent these cats for a cat detector! (II)

10

Represent these cats for a cat detector! (II)

11

Represent these cats for a cat detector! (III)

12

Represent these cats for a cat detector! (IV)

13

Color Histograms

Deformable Part based Models

(DPM)

Histogram of Gradients

(HOG)

Models based Shapes

14 Felzenszwalb et al. 2010. Dalal and Triggs, 2005. Beis and Lowe, 1997.

Not always as easy (Happy vs Sad)

15

Not always as easy (Sad)

16

Learning Representations

Convolutional Neural Network Autoencoder 17

LeCun et al. 1998. Hinton et al. 2006.

Two approaches to learning Unsupervised

Representation constrained on reconstruction.

18

Supervised

Representation constrained on task(s).

Two approaches to learning Supervised

Representation constrained on task(s).

Unsupervised

Representation constrained on reconstruction.

19 LeCun et al. 1998. Hinton et al. 2006.

20 Stanford CS231n

Lightning overview of Neural Networks

Neural Net

A Neuron

Convolutional X

Understanding Representations

Embedding 21 Maaten and Hinton, 2008.

Understanding Representations

Inverting a representation

[ 81 20 84 64 58 39 17 54 72 15]

22

Understanding Representations

Inverting a representation 23 Dosovitskiy and Brox, 2015.

Representations in NLP, Brain, Speech, etc.

Word2Vec (NLP) FMRI Scan (brain) 24 Mikolov et al. 2013

“Transcript”

Cat

Macbeth was guilty.

[ 81 20 84 64 58 39 17 54 72 15]

Representation Mathematical Model (e.g., classifier)

CS231n

CS331b CS229

25

Now that we’re done with background building…

26

Open-World (Generic) Representations

27

Open-World (Generic) Computer Vision

An Exciting Time!

Fully Supervised Learning

• Fully supervised learning is task specific. • Will not lead to a human-like comprehensive perception.

• Characterized by Generalization & Abstraction

➔ How to develop a system/representation with Generalization & Abstraction?

How to achieve generalization & abstraction?

• Proposition: • (Instead of providing supervision over the desired tasks) • Provide supervision over a set of selected foundational tasks ⇒ generalization to novel tasks and abstraction capabilities.

Held & Hein. 1963.

• But how to pick the foundational tasks? • Biology! • Inspirations from developmental

stages of visual skills in brainGeneric 3D Representation via Pose Estimation and Matching. ECCV 2016. Amir Zamir, Tilman Wekel, Pulkit Agrawal, Colin Wei, Jitendra Malik, Silvio Savarese.

Generic 3D Representation Learning

Generic 3D Representation via Pose Estimation and Matching. Amir Zamir, Tilman Wekel, Pulkit Agrawal, Colin Wei, Jitendra Malik, Silvio Savarese. ECCV 2016.

Generic 3D Representation Learning

Generic 3D Representation via Pose Estimation and Matching. Amir Zamir, Tilman Wekel, Pulkit Agrawal, Colin Wei, Jitendra Malik, Silvio Savarese. ECCV 2016.

Learn it from the world!

Dataset Coverage

The 3D Representation

Evaluations

• Camera pose estimation • Matching (wide-baseline)

State-of-the-art Human-level

• Surface Normal • 3D Object Pose • 3D Scene Layout • Visual Abstraction

State-of-the-art unsupervised

Unsupervised TasksSupervised Tasks

Pose (surface normal) Embedding

Imag

eNet

(A

lexN

et)

Gen

eric

3D

Rep

.

Krizhevsky et al. 2012. Russakovsky et al. 2015

MIT Places

MIT Places

ImageNet (AlexNet) Generic 3D Rep.

Zhou et al. 2014 Krizhevsky et al. 2012. Russakovsky et al. 2015

3D Object Pose - ImageNet

Generic 3D Rep.

Wang & Gupta. 2015 Krizhevsky et al. 2012. Russakovsky et al. 2015

3D Object Pose - ImageNet

3D Object Pose - ImageNet

http://3drepresentation.stanford.edu/

Query Image

Generic 3D Representation

ImageNet (AlexNet)

3D Object Pose Estimation - Abstraction

Wan

g &

Gup

ta. 2

015

Agr

awal

et a

l. 20

15.

Rus

sako

vsky

et a

l. 20

15

Ozu

ysal

et a

l. 20

09.

3D Object Pose Estimation - Abstraction

Wan

g &

Gup

ta. 2

015

Agr

awal

et a

l. 20

15.

Rus

sako

vsky

et a

l. 20

15

Ozu

ysal

et a

l. 20

09.

3D Object Pose Estimation – Cross Category

Wang & Gupta. 2015 Agrawal et al. 2015. Russakovsky et al. 2015 Xiang et al. 2014.

3D Object Pose Estimation – Cross Category

Wang & Gupta. 2015 Agrawal et al. 2015. Russakovsky et al. 2015 Xiang et al. 2014.

3D Layout Estimation - Abstraction

Wan

g &

Gup

ta. 2

015

Agr

awal

et a

l. 20

15.

Rus

sako

vsky

et a

l. 20

15

Zhan

g et

al.

2016

.

3D Layout Estimation - Abstraction

Wan

g &

Gup

ta. 2

015

Agr

awal

et a

l. 20

15.

Rus

sako

vsky

et a

l. 20

15

Zhan

g et

al.

2016

.

Unsupervised Evaluations

Surface Normal Estimation (NYUv2)

Scene Layout Classification (LSUN)

Scene Layout Estimation (LSUN)

Object Pose Estimation (PASCAL3D)

What’s under the hood – Vanishing Points?

Mahendran & Vedadi. 2015. Angladon et al. 2015. Denis et al. 2008. Li et al. 2010.

Matching Evaluation

Zagoruyko, & Komodakis. 2015. Simi-Serra et al. 2015. Lowe. 2004. Arandjelovic & Zisserman. 2012. Wu et al. 2011. Simonyan & Zisserman. 2014. Tola et al. 2008. Morel et al. 2009.

Matching Results

Matching Results

Matching Results

Pose Regression Results

GTEstimated

Pose Regression Results

GT

Estimated

Pose Estimation Evaluation

Wu. 2011. Geiger et al. 2011 Wu et al. 2011.

`• Task Taxonomy

• Problem space is unknown. • Essential for the 3D-complete representation

• Proper Fusion Techniques • Beyond simplistic late-fusion or ConvNet fine tuning • Essential for the vision-complete representation

• Proper Data!

Generic 3D Representation via Pose Estimation and Matching. Amir Zamir, Tilman Wekel, Pulkit Agrowal, Colin Wei, Jitendra Malik, Silvio Savarese. ECCV 2016.

zamir@cs.stanford.edu

http://www.cs.stanford.edu/~amirz/

http://3DRepresentation.stanford.edu/

top related