training models for road scene understanding with ... · automated ground truth / cross-sensor...

58
Training models for road scene understanding with automated ground truth Dan Levi With: Noa Garnett, Ethan Fetaya, Shai Silberstein, Rafi Cohen, Shaul Oron, Uri Verner, Ariel Ayash, Kobi Horn, Vlad Golder, Tomer Peer

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Training models for road scene understanding with automated ground truth

Dan Levi

With: Noa Garnett, Ethan Fetaya, Shai Silberstein, Rafi Cohen, Shaul Oron, Uri Verner, Ariel Ayash, Kobi Horn, Vlad Golder, Tomer Peer

Page 2: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

GM Advanced Technical Center in Israel (ATCI)

Page 3: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Agenda

• Road scene understanding

• Acquiring training data with automated ground truth (AGT)

• Test cases:• General obstacle detection• Road segmentation• General obstacle classification• Curb detection• Freespace

• Challenges and limitations

• Summary and future work

Page 4: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

On-board road scene understanding

Static:

• Road edge

• Road markings, complex lane understanding

• Signs

• Obstacles: clutter, construction zone cones

Dynamic:

• Classified objects (cars, pedestrians, bicycles, animals …)

• General obstacles: animals, carts

Page 5: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Obstacle detection: general and category based

Page 6: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

General obstacles, freespace, road segmentation

- Road segmentation (vision)Mono-camera using semantic segmentation

- General obstacle detection - Freespace (all non-flat road delimiters)3D sensors (Stereo, Lidar)

Page 7: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Training Data

Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [Sun et al. 2017]

Page 8: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Manual Annotation

• Time: ~60 min per image

• ~1000 annotators

The Cityscapes Dataset for Semantic Urban Scene Understanding [Cordts et al. 2016]

Page 9: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Computer graphics simulated data

• Photo-realism

• Scenario generation

Page 10: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Simulated data for optical flow

FlowNet: Learning Optical Flow with Convolutional Networks [Dosovitskiy et. al. 2015]

Page 11: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Automated ground truth(AGT) / Cross-sensor learning

Velodyne LIDAR

Page 12: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Depth from a single image

Semi-Supervised Deep Learning for Monocular Depth Map Prediction [Kuznietsov et al. 2017]

Page 13: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Map supervised road detection

Map-supervised road detection [Laddha et al. 2016]

Page 14: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for road scene understanding – general setup

“Target” sensors

Perquisite: Full alignment and synchronization between sensors

“Supervising” sensors

Page 15: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for road scene understanding: scheme

“Supervising” sensors:

“Target” sensors:

Task: object detection

Data

AGT:1. Compute Task on Supervising sensors:- Offline- Temporal

Ground truth

AGT:2. Project output to target sensor domain

Page 16: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Automated ground truth / Cross-sensor learning

1. Solve an “easier” problem- Run time- Completeness

2. Promise- Scalability- Continuous (un-bounded)

improvement

1. Challenging setup2. Annotation quality / accuracy3. Inherent limitations of “supervisor”:- Learning beyond supervisor

capabilities- Learning from the same sensor

(bootstrapping)

Page 17: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for General obstacle detection

“Supervising” sensors:

“Target” sensors:

Task: General obs. Det.

Data

Ground truthAGT

Velodyne HDL64

Front camera

Page 18: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

StixelNet: Monocular obstacle detection

Levi, Dan, Noa Garnett, Ethan Fetaya. StixelNet : A Deep Convolutional Network for Obstacle Detection and Road Segmentation. In BMVC 2015.

Page 19: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

StixelNet column based approach

INPUT

OUTPUT

Page 20: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for obstacle detection – version I

KITTI Dataset [Geiger et al. 2013]

Page 21: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Velodyne LIDAR

AGT for obstacle detection – version I

• Raw images: 56 sequences (50 Train, 6 Test). • 6,000 train images (every 5th frame) and 800 test.• Ground truth result:• After GT: 331K training columns and 57K testing

Page 22: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for general obstacles in image plane

Figure taken from [Fernandes et al. 2015]

1. Project Lidar points to image plane 2. Interpolate depth to all pixels

3. Find columns with depth profile typical to transition: road roughly vertical obstacle

Page 23: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

General obstacles AGT examples

Limitations:- Cannot handle: close obstacles, ‘’clear’’ columns- Low coverage (~30%)

Page 24: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

INPUT OUTPUT

64200

5

3

Max pooling (8X4)

45

5

11

1

1024

Dense Dense Dense

Layer 1:convolution

al

Layer 2:convolution

al

Layer 3: fully connected

Layer 4:fully

connected

Layer 5:fully

connected

11

5

24

370

2048

50

3

Max pooling (4X3)

StixelNet (v1) 5 Layer CNN

y

Page 25: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability
Page 26: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Experimental Results (Max Probability)

Page 27: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Stereo [Badino et al. 2009]“StixelNet”

Comparison with stereo

Page 28: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for Obstacle classification

“Supervising” sensors:

“Target” sensors:

Task: Obstacle classification

Data

Ground truthAGT

Front camera

Page 29: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for obstacle classification

Image based detection

Source: http://self-driving-future.com/the-eyes/velodyne/

Lidar based verification

Page 30: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Obstacle classification trained net result: pedestrians

Page 31: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for General obstacle detection (ver 2)

“Supervising” sensors:

“Target” sensors:

Task: General obs. Det.

Data

Ground truthAGT

Velodyne HDL64

Front camera

Page 32: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Unified network: StixelNet + Object detection + Object pose estimation

Noa Garnett, Shai Silberstein, Shaul Oron, Ethan Fetaya, Uri Verner, Ariel Ayash, Vlad Goldner, Rafi Cohen, Kobi Horn, Dan Levi. Real-time category-based and general obstacle detection for autonomous driving. CVRSUAD Workshop, ICCV2017.

Page 33: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Object-centric obstacle detection AGTEstimate and subtract road

plane

3D Clustering (objects above

20cm)

Project to image, smooth

Bottom contour via dynamic

programming

Detect clear columns:No object above 5cm + far enough returns

‘’near’’ obstacles:1. Below lidar

coverage2. During training

Page 34: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

General obstacles: old vs. new AGT

Page 35: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

New general obstacle dataset with fisheye lens camera

#images #instances (columns)

Kitti--train 6K 5M

Internal-train

16K 20M

Kitti-test 760 11K

Internal-test 910 19K

Page 36: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

StixelNet2: New network architecture

Page 37: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Improved results on KITTI

Page 38: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

0

0.2

0.4

0.6

0.8

1

Kitti - max Pr. Internal - max Pr. kitti - avg. Pr Internal - avg. Pr

Old New

Experimental results with new AGT

Page 39: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Experimental results with new AGT

0

0.2

0.4

0.6

0.8

1

Kitti - max Pr. Internal - max Pr. kitti - avg. Pr Internal - avg. Pr

Chart TitleOld New

Edge cases excluded (“near”, “clear”)

Page 40: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Cross dataset generalization

0

0.2

0.4

0.6

0.8

1

kitti-test max kitti-test avg internal-testmax

internal-testavg

Train on KITTI Train on internal Train on both

Page 41: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for car pose estimation

“Supervising” sensors:

“Target” sensors:

Task: pose estimation

Data

Ground truthAGT

IMU

Page 42: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for pose estimation

Source: http://self-driving-future.com/the-eyes/velodyne/

Multi sensor, temporal object detection

8 orientation bins pose representation

Dynamic Static

Page 44: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for Curb detection

Page 45: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Curb detection trained net result examples

Page 46: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for freespace

“Supervising” sensors:

“Target” sensors:

Task: freespace

Data

Ground truthAGT

Page 47: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT for freespace with 3D beams

Analyze single Lidar “Beam”

Estimate and subtract road plane

Project freespace limit to image plane, find ‘’near’’ and ‘’clear’’

Project limit to ground plane

Velodyne

Velodyne scan direction

Page 48: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Obstacles vs. Freespace AGT

Page 49: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Freespace + object detection + car 3D pose

Page 50: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Freespace + object detection + car 3D pose

Page 51: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Freespace + object detection + car 3D pose

Page 52: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Freespace + object detection + car 3D pose

Page 53: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Freespace + object detection + car 3D pose

Page 54: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Finetuning from AGT: road segmentation

1. Fine-tune on KITTI Road segmentation (manually labelled)

2. Graph-cut segmentation

3. State-of-the-art accuracy among non-anonymous (94.88% MaxF)

Page 55: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT challenges: How accurate is the AGT?

Page 56: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT challenges: calibration, synchronization

Page 57: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

AGT Perception mistakes

Non-flat road

Assumptions / coverage

Page 58: Training models for road scene understanding with ... · Automated ground truth / Cross-sensor learning 1. Solve an easier problem - Run time - Completeness 2. Promise - Scalability

Thank you!