high speed obstacle avoidance using monocular vision and reinforcement learning jeff michels...

High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning

Jeff MichelsAshutosh SaxenaAndrew Y. Ng

Stanford University

ICML 2005.

http://ai.stanford.edu/~asaxena/rccar/

ICML 2005. Jeff Michels, Ashutosh Saxena & Andrew Y. Ng

Problem Drive a remote control car

at high speeds Unstructured outdoor

environments Off the shelf hardware,

inexpensive cameras and little processing power

Vision and Driving ControlQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.


Prior Work: Vision Estimating depth from multiple

images: Stereovision (e.g., Scharstein &

Szeliski, 2002) Depth from Defocus (e.g.,

Klarquist et al., 1995) Optical Flow/Structure from

motion (e.g., Barron et al., 1994)

Motivation #1: Monocular vision. Stereo vision has limits

baseline distance between cameras vibration and blur

We would like to explore the use of monocular cues.


Prior Work: Driving Control

Driving Stereo-vision for driving (LeCun, 2003) Highways with clear lane markings (Pomerleau, 1989) Single camera for indoor robot, but known color and texture of ground

(Gini & Marchi, 2002)

Motivation #2: Reinforcement learning Many past successes used model-based RL. Does model-based RL still make sense even for tasks requiring

complex perception? (To simulate vision input, we need to use computer graphics!)


Approach

Vision System Estimate distance to nearest obstacle in each

possible steering direction.

Driving Control Map from the output of the vision system into

steering commands for the car. Use reinforcement learning to learn the policy.


Vision System: Training Data Image divided into vertical

columns corresponding to possible steering directions.

Image labeled with depth for each vertical column

Laser range finder -- ground truth distances


Vision System: Monocular Cues Monocular Cues used by

humans for depth perception Texture Variations - Laws’ Texture Gradient (Linear

Perspective) - Radon, Harris Haze - Color Occlusion Known Object Size

(Loomis, Nature 2001)


Feature Vector: Monocular Cues Texture Variation Texture Gradient Occlusion, Object Size,

Global structure Overlapping windows Appending adjacent stripe’s

vectors

The feature vector size is 858 dimension


Learning Algorithm

Supervised learning to estimate the distance d in each column of the image.

Learn weights w via ordinary least squares with quadratic cost.

arg minw ∑i (di - wT xi )2

Other regression methods (SVR, robust regression) gave similar results

weightsdepth

featuresi = columns, images


Results: Learning Depth Estimates

Errors on a log scaleE = ∑ | log10(d) – log10(destimated) |

Able to predict depth with a average error of 0.26 orders of magnitude.

0.2

0.25

0.3

0.35

0.4

Radon(TextureGradient)

Harris(TextureGradient)

Laws(Texture

Variations)

All


Synthetic Graphics Data

Graphics images for training the vision system.

Variable degree of graphical realism

Can a system trained on synthetic images predict distances on real images?


Results: Combined Vision System

0

5

10

15

20

25

Random Graphics Real Combined

Haz

ard

Rate

(%)

When the distance to nearest obstacle in the chosen direction is less than 5 m, then it is a hazard.

Hazard rate improves by combining the real and synthetic trained system.

24% hazard rate reduction over using only real images.


Control: Reinforcement Learning Model based RL -- hard perception problem Randomly generated environment in

graphics simulator Pegasus (Ng & Jordan, 2000) to learn

control policy Car initialized at (0,0) and ran for fixed time

horizon. Learning algorithm converged after 1674

iterations of policy search.


Reinforcement Learning: Parameters

1: spatial smoothing of predicted distances

2: threshold distance for evasive action

3: steering angle parameter

4, 5: evasive action parameters

6: throttle parameter


QuickTime™ and a decompressor

are needed to see this picture.


Results: Actual Driving Experiments


QuickTime™ and aCinepak decompressor



QuickTime™ and a decompressor



Results: Driving Times

QuickTime™ and aTIFF (LZW) decompressor



Summary

Monocular depth estimation is an interesting and important problem.

Supervised learning for depth estimation. Model-based RL, using computer graphics

simulator, to learn controller.


Extensions/Future Work

Learn complete depth maps

Markov Random Field (MRF) to estimate depths.

Image Ground Truth Predicted

[also with Sung Chung.]

Learning depth from single monocular images,

Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng.

In NIPS 2005.


Contact:

Ashutosh Saxena, [email protected]

http://ai.stanford.edu/~asaxena/rccar/

http://ai.stanford.edu/~asaxena/learningdepth/

high speed obstacle avoidance using monocular vision and reinforcement learning jeff michels...

Documents