bair retreat 3/28/17 · learning a universal driving policy • self driving as egomotion...

108
BAIR Retreat 3/28/17 Trevor Darrell UC Berkeley

Upload: others

Post on 05-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

BAIR Retreat 3/28/17

Trevor DarrellUC Berkeley

Page 2: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Overview

Adversarial Domain Adaptation

Learning end-to-end driving models from crowdsourced dashcams

Vision and Language: Learning to reason to answer and explain

2

Page 3: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adversarial Domain Adaptation

3

Trevor

Darrell

UC Berkeley

Judy

HoffmanUC Berkeley/

Stanford

Eric

TzengUC Berkeley

Page 4: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Source Data

backpack chair bike

fc8conv1 conv5 source

data

fc6

fc7

classificationloss

Adapting across domains ?

Page 5: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Source Data

backpack chair bike

Target Data

?

fc8conv1 conv5

fc6

fc7

Adapting across domains ?

• Applying source classifier to target domain can yield inferior performance…

classificationloss

Page 6: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Source Data

backpack chair bike

Target Databackpack

?

fc8

conv1 conv5fc6

fc7 labeled

target data

fc8conv1 conv5 source

data

fc6

fc7

classificationloss

shar

ed

shar

ed

shar

ed

shar

ed

shar

ed

Adapting across domains ?

• Fine tune? …..Zero or few labels in target domain

• Siamese network?…..No paired / aligned instance examples!

Page 7: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

object classifier

Page 8: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

object classifier

Discrepency

Page 9: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

object classifier

domain classifier

Page 10: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

domain classifier

Page 11: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

domain classifier

Page 12: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

domain classifier

Page 13: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adapting across domains: minimize discrepancy

[ICCV 2015]

object classifier

Page 14: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Source Data

backpac

kchair bike

fc

8conv1 conv5

fc

6

fc

7labeled target

data

fc

8conv1 conv5source

data

fc

D

fc

6

fc

7

classification

loss

domain

confusion

loss

domain

classifier

loss

share

d

share

d

share

d

share

d

share

d

Domain Label Cross-entropy with uniform distribution

Adversarial Training of domain label predictor

and domain confusion loss:

Deep domain confusion

Unlabeled Target Data

?

[Tzeng ICCV15 ]

Page 15: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Deep domain confusion

Train a network to minimize classification loss AND confuse two domains

[Tzeng ICCV15 ]

source

inputs

target

inputs

network

parameters

(fixed)

domain

classifier

(learn)domain classifier loss

domain classifier prediction

network

parameters (learn)

domain confusion loss

= 𝑝(𝑦𝐷 = 1|𝑥)

(cross-entropy with uniform distribution)

domain

classifier (fixed)

16

iterate

Page 16: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Source Data + Labels

backpac

kchair bike

Unlabeled Target Data

?

Encoder

Cla

ssifie

r

Encoderclassification

loss

Adversarial Discriminative Domain Adaptation (ADDA)

Discriminator Adversarial

loss

Which loss?Shared or not?Generative or

discriminative?GAN

(in submission)

Page 17: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Adversarial Discriminative Domain Adaptation (ADDA)

source images + labels

Clas

sifie

r

Pre-training

classlabel

source images

SourceCNN

Dis

crim

inat

or

Adversarial Adaptation

domainlabel

TargetCNN

target images

Clas

sifie

r

Testing

classlabel

TargetCNN

target image

SourceCNN

(in submission)

Page 18: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

ADDA: Adaptation on digits (in submission)

Page 19: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

ADDA: Adaptation on RGB-D (in submission)

Train on RGB

Test on depth

Page 20: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Autonomous Driving Paradigms

1) Learn affordances to predict state; apply rules or learned classic controllers

2) Abandon engineering principles, learn “end-to-end” policy

Page 21: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Autonomous Driving Paradigms

1) Learn affordances to predict state; apply rules or learned classic controllers

How can visual sensing be robust to new enviroments?

2) Abandon engineering principles, learn “end-to-end” policy

How to learn generic driving policies from diverse data?

Page 22: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Autonomous Driving Paradigms

1) Learn affordances to predict state; apply rules or learned classic controllers

How can visual sensing be robust to new enviroments? …Fully Convolutional Domain Adaptation “in the wild”

2) Abandon engineering principles, learn “end-to-end” policy

How to learn generic driving policies from diverse data?…Learning end-to-end driving policy/model from crowdsourced videos

Page 23: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

BDD Dataset

BDD Video BDD Segmentation

• 720p 30fps 40s video clips

• ~50K clips

• GPS + IMU

• 720p images

• Fine instance segmentation

• Compatible with Cityscapes

Page 24: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

In-domain fully supervised FCN

Train on Cityscapes, Test on Cityscapes

Page 25: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Domain shift: Cityscapes to SF

Train on Cityscapes, Test on San Francisco Dashcam

Page 26: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

No tunnels in CityScapes?...

Page 27: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use
Page 28: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Medium Shift: Cross Seasons Adaptation

Page 29: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Small Shift: Cross City Adaptation

Page 30: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Before domain confusion After domain confusion

Effect of domain confusion loss

Page 31: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Before domain confusion After domain confusion

Effect of domain confusion loss

Page 32: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Before domain confusion After domain confusion

Effect of domain confusion loss

Page 33: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Before domain confusion After domain confusion

Effect of domain confusion loss

Page 34: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

BDD Dataset – static

Page 35: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Overview

Adversarial Domain Adaptation

Learning end-to-end driving models from crowdsourced dashcams

Vision and Language: Learning to reason to answer and explain

37

Page 36: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Learning and Adapting from Large-Scale Driving Data

• Fully Convolutional Domain Adaptation “in the wild”

• Learning end-to-end driving policy/model from dashcam videos

Page 37: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

End-to-End Paradigm

• ALVINN

• DAVE

• NVIDIA

• BDD RC Cars

• BDD WebCam

Page 38: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

AVLINN(1989)

Page 39: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

DAVE (2003)

Yann LeCun, Eric Cosatto, Jan Ben, Urs Muller, Beat Flepp: End-to-End Learning of Vision-Based Obstacle Avoidance for Off-Road Robots. Delivered at the Learning@Snowbird Workshop, April 2004.

Page 40: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

NVIDIA (2016)

Page 41: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

[Karl Zipser]

Page 42: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

model driving car, ‘direct’ mode

Page 43: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

10 future

timepoints

10 future

timepoints

conv1 96 channels

384 channels

4 channels

conv2

metadata input 4 channels

camera input

fully connected 512 channels

steering motor

Page 44: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use
Page 45: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use
Page 46: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Driving Policy

Page 47: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Learning a universal driving policy

• Self driving as egomotion prediction

• Learn general driving policy that is

applicable to all car models.

• Use a large number of easily accessible

dashcam videos as self-supervision.

Page 48: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

FCN-LSTM

Visual Encoder

• Dilated Fully Convolutional Nets could provide more spatial details than CNN

Temporal Fusion

• Fuse the visual information, vehicle state (speed and angular velocity) from

each frame

Page 49: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

FCN-LSTM

Privileged Learning

• The model should implicitly know what

objects are in the scene

• We use the semantic segmentation mask

from Cityscapes as extra source of

supervision

• It ultimately improves the learnt

representation of the dilated FCN

Vapnik V, Vashist A. A new learning paradigm: Learning using privileged information[J]. Neural Networks the Official Journal of the International Neural Network Society, 2009, 22(5-6):544-57.

Page 50: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Dataset

Sample frames from the dataset

• Real first person driving

videos

• Diverse• City

• Highway

• Rainy days

• Nights and evenings

• Construction zones

Page 51: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Scene and Trajectory Reconstruction of

Crowd-sourced Driving Videos using

Semantic Filtered SfM

Yang Gao*, Huazhe Xu*, Christian Hane, Fisher Yu,

Trevor Darrell

Page 52: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Challenging Driving Videos in the Wild

Challenges

Moving Objects

Subtle behaviors

Lane changing

Slight Steering

Unknown Camera Calibration

Rolling Shutters

Page 53: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Existing Motion-Based Method Failed to Reject Moving Object from

the Scene

Keypoints from motion-based keypoints rejection

methods

Keypoints from our Semantic Filtered SfM pipeline. Most

moving keypoints have been filtered out.

Page 54: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Semantically Filtered SfM: 𝑆𝑓 2𝑀

Classical keypoints matching as points pair preference ranking

𝑀 𝑖1, 𝑖2 =1

𝑑 𝐼1, 𝑖1 , 𝑑 𝐼2, 𝑖2 2

M is the preference score over point pair 𝑖1, 𝑖2 , defined by distance between two low level descriptors 𝑑(⋅,⋅).

Classical matchings could be formulated as ranking based on 𝑀 ⋅,⋅

Semantics should be incorporated in SfM to be robust to moving objects

𝑀 𝑖1, 𝑖2 =𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝐼1,𝐼2 [𝑖1,𝑖2]

𝑑 𝐼1,𝑖1 ,𝑑 𝐼2,𝑖2 2

Use the FCN as a semantic term

𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝐼1, 𝐼2 𝑖1, 𝑖2 = 𝐹𝐶𝑁 𝐼1 𝑖1 ⋅ 𝐹𝐶𝑁(𝐼2)[𝑖2]

Page 55: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

City Turning Example

Page 56: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Lots of Moving Vehicles Example

Page 57: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Recover the subtle car backing behavior

Page 58: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Experiments – Continuous Actions

Lane following: left and right

Page 59: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Experiments – Continuous Actions

Intersection

Page 60: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Experiments – Continuous Actions

Side Walk

Page 61: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use
Page 62: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

BDD Dataset – video

Page 63: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Overview

Adversarial Domain Adaptation

Learning end-to-end driving models from crowdsourced dashcams

Vision and Language: Learning to reason to answer and explain

65

Page 64: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Explainable AI (XAI): Visual Explanations

Western Grebe

This is a Western Grebe because this bird has a long white neck, pointy yellow beak and red eye.

Laysan AlbatrosThis is a Laysan Albatross because this bird has a hooked yellow beak white neck and black back.

This bird has black and white feathers, with a white neck and a yellow beak.

Visual Explanations

Class definitions

ImageDescriptions

Image Relevance

Class discriminating

Page 65: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Explainable Models with Implicit Capabilities

•Translate DNN hidden state into • human-interpretable language • visualizations and exemplars

Can you park here?

Page 66: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Visual Question Answering

Page 67: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use
Page 68: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

NAACL 2016: MCB with Attention• Predict spatial attentions with MCB

CN

N(R

esNet1

52

)

Tile

MCB

WE, LSTM

16k x14x14Signed

Sqrt

L2 N

orm

alization2048x14x14

2048x14x14

Co

nv, R

elu

512

x 1

4 x

14

Softm

ax

1 x

14 x

14

Weigh

ted Su

m

Co

nv

Full C

on

nected

CornMCB

16k

Softm

ax

Signed

Sqrt

L2 N

orm

alization

16k

16k

3000

2048

2048

What is the yellow food?

Attention for captioning : - K. Xu, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Attention for VQA : - H. Xu, K. Saenko Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering- J.Lu Hierarchal Question-Image Co-Attention for Visual Question Answering

Page 69: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Winner VQA Challenge 2016 (real open ended)

535455565758596061626364656667

UC

Be

rkel

ey &

So

ny

Nav

er

Lab

s

DLA

IT

snu

bi-

nav

erl

abs

PO

STEC

H

Bra

nd

eis

VTC

om

pu

terV

iso

n

MIL

-UT

klab

SHB

_10

26

MM

CX

VT_

CV

_Jia

sen

LV-N

US

AC

VT_

Ad

elai

de

UC

Be

rkel

ey (

DN

MN

)

CN

NA

tt

san

UC

Be

rkel

ey (

NM

N)

glo

bal

_vis

ion

vqat

eam

-dee

pe

rLST

M_N

orm

lizeC

NN

Mu

jtab

a h

asan RIT

UP

V_U

B

Bo

lei

att

vqat

eam

-lst

m_

cnn

UP

C

Test

-Ch

alle

nge

Acc

ura

cy

ou

rs

Page 70: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

What is the woman feeding the giraffe?Carrot[Groundtruth: Carrot]

Page 71: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

What color is her shirt?Purple[Groundtruth: Purple]

Page 72: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

What is her hairstyle for the picture?Ponytail[Groundtruth: Ponytail]

Page 73: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

What color is the chain on the red dress?Pink[Groundtruth: Gold]

• Correct Attention, Incorrect Fine-grained Recognition

Page 74: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

Is the man going to fall down?No[Groundtruth: No]

Page 75: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

What is the surface of the court made of?Clay[Groundtruth: Clay]

Page 76: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attention Visualizations

What sport is being played?Tennis[Groundtruth: Tennis]

Page 77: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

Page 78: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use
Page 79: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Human ground truth for the textual justification task.

Page 80: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Human ground truth for the pointing task.

Page 81: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Discussing different evidence for different images.

Page 82: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Discussing different evidence for different questions.

Page 83: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Discussing different evidence for different questions.

Page 84: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Differentiating between some activities requires understanding

special equipment.

Page 85: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Differentiating between some activities requires recognizing

specific context.

Page 86: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Differentiating between some activities requires recognizing

specific context.

Page 87: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Explanations when the model predicts the wrong answer.

Page 88: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Explainable Models with Explicit Capabilities

Explain higher-level reasoning in DNNs

Explainable decision path for multi-task, control and planning

Provide structure and intermediate state

Can you park here?

Page 89: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Explainable Models with

Explicit and Implicit Capabilities

No, because there is a person in the bus.

Page 90: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Grounded question answering

yes

93

Is there a red shape above

a circle?

Page 91: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Neural nets learn lexical groundings

yes

94

[Iyyer et al. 2014, Bordes et al. 2014, Yang et al. 2015, Malinowski et al., 2015]

Is there a red shape above

a circle?

Page 92: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Semantic parsers learn composition

yes

97

[Wong & Mooney 2007, Kwiatkowski et al. 2010,Liang et al. 2011, A et al. 2013]

Is there a red shape above

a circle?

Page 93: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Neural module networks learn both!

yesIs there a red shape above

a circle?

redandand

98

Page 94: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Neural module networks

Is there a red shape above a circle?

red

exists true↦

above ↦

99

Page 95: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Neural module networks

Is there a red shape above a circle?

red

exists true↦

above ↦

circle

red above

exists

and

100

Page 96: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Neural module networks

yesIs there a red shape

above a circle?

red

exists true↦

above ↦

circle

red above

exists

and101

Page 97: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Q: Are there an equal number of large things and metal spheres?

Q: What size is the cylinder that is left of the brown

metal thingthat is left of the big sphere?

Q: There is a sphere with the same size as the metal cube; is

it made of the same material as the small red sphere?

Q: How many objects are either small cylinders or red things?

Questions in CLEVR test various aspects of visual reasoning

including attribute identification,counting, comparison, spatial

relationships, andlogical operations.

Page 98: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Learning to Reason: End-to-End Module

Networks for Visual Question Answering

R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko

Page 99: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Background

Natural language is compositional: the meaning of a sentence comes from the meanings of its components.

Different questions may require significantly different reasoning procedure.

● What kind of vehicle is the one on the left of the brown car that is next to the building?

● Why is the person running away?

Page 100: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Background

● Neural Module Networks: dynamic inference structure for each question

● Previous work: structure from NLP parser or parser re-ranking

● This work: learned layout policy to dynamically build a network

Page 101: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

End-to-End Module Networks (N2NMN)

Components

● Layout policy p(l | q) with sequence-to-sequence RNN

● Neural modules with co-attention, dynamically assembled into a network

Page 102: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

End-to-end Training

Loss

where is the softmax loss of the answer

Optimization: policy gradient method

Easier: behavior cloning from expert layout policy

Page 103: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Experiments on the CLEVR dataset

Page 104: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Cloning

expert

End-to-end

optimization

after cloning

Page 105: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Policy Search from scratch (no experts used)

Even without resorting to an expert policy during training, our method still achieves state-of-the-art

performance with reinforcement learning from scratch.

Page 106: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Accuracy v.s. Question length

Even on long questions with 30+ words, our method still achieves relatively high accuracy (Figure a).

Page 107: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Detailed output visualization

Page 108: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use

Overview

Adversarial Domain Adaptation

Learning end-to-end driving models from crowdsourced dashcams

Vision and Language: Learning to reason to answer and explain

113