real time visual methods for shape- based object detection ... · real time visual methods for...

Real time visual methods for shape-based object detection and RGB-D

camera relocalisation

Walterio Mayol-Cuevas

Computer Science Department, University of Bristol

Qualcomm Seminar Series @ TU Vienna, February 2013

Work with

• Andrew Calway

• Andrew Davison

• Andrew Gee

• Denis Chekhlov

• Pished Bunnun

• Dima Damen

Wearable personal assistants

Mayol , Davison, Murray 2003

../tmp/wearableslam2.mpg

[Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07]

•Uses SIFT-like descriptors (histogram of gradients) around

Harris corners. Get scale from SLAM = “predictive SIFT”.

[Chekhlov, Gee, Calway and Mayol ISMAR07]

SLAM with higher-level structure discovery for AR

see Andrew Gee’s PhD thesis

../tmp/ninja_on_plane_SLAM_ISMAR07.avi

Rationale for the work in this

talk: Useful Mapping. • Maps needed for robots and other intelligent

systems need to be more than collections of

geometric features:

From “Beautiful Maps”

to Useful Maps

Object Recognition semantic inference

Activity Recognition what happens where?

E.g. for Mapping egocentric activities

Sundaram & Mayol-Cuevas ISVC 2010/ICRA 2012

Localisation via SLAM,

Learning action models via

graphs of HoGs

& Modelling manipulation workflows

Bristol: D. Damen, A. Gee, A. Calway & W. Mayol-Cuevas

EU FP7

Two related tasks:

• 3D Object detection

• Camera relocalisation

• Both can be seen as a search + pose

computation problem.

• They happen at “different scales”:

• Object detection is likely to be for multiple objects

under various poses and scales and under partial

occlusion.

• Relocalisation is likely to be for a set of places and

where some things have moved around.

Part I: Texture-less object

detection

Texture-less object detection

O Carmichael and M Hebert. BMVC2002. M Leordeanu etal CVPR2007

P Yarlagaddaet al ECCV 2010 S Fidler et al ECCV 2010

Texture-less object detection

S Hinterstoisser, et al. CVPR2010.

Dominant Orientation Templates:

Our motivation:

• For multiple known 3D objects

• Texture-minimal / Texture-less

• At multiple frames per second

• Scalable & invariance built-in

• And online training: in-situ operation

Detecting Texture-Minimal Objects

Tackling the tractability problem

On a “Simple” image like this,

the maximum number of

edge configurations can be

of the order of 10s of

thousands of millions of

possibilities. For a 5-edgelet

chain on an image with n

edgelets this is:

Key idea: use fixed paths. Instead of searching and training for the object in

any possible way, fixing paths does this in a pre-determined manner. For this

image it means 8 orders of magnitude less options instead ~1.5K only.

What is a fixed path?

Damen, et al. BMVC 2012

A fixed path:

Extracts many different

descriptors i.e. lengths

and edgelet’s relative

orientations

A single, fixed path

still covers well

different objects


How to Select the Paths?? • We performed this once:

• Randomly selected 100 angle tuples

• Test performance on an independent set of objects

• Test # of extracted constellations + ambiguity of descriptor

• Best 6 paths were selected

How to Select the Paths??

On-line Training


Testing


30 objects & tools


Results – Recall vs Precision


Results - Scalability


Clutter handling


30 objects & tools

BMVC2012.avi

Detection using the N900

Bunnun, Damen, Calway, Mayol-Cuevas, ISMAR 2012

Today @ Vienna

N900Textureless.wmv

VIDEO0118.3gp

In-situ modelling, tracking and

detection on a mobile

In-situ modelling 6D tracking Detection


../../ISMAR12/Pished/Video/ISMAR2012_Final.wmv

Results on mobile phone

About 0.8s per successful detection on images with about 150 edglets


Detection primed by eye-gaze

Code Released

• C++, tested on Ubuntu and works in ROS

• http://www.cs.bris.ac.uk/~damen/MultiObjDetector.htm

Part II: 6D relocalisation

Fast RGB-D relocalisation

Exploit workspace info:

•Camera trajectory constrained by task

•Known 3D model of workspace

Our approach:

•Sample low resolution synthetic views around known trajectory

•Pose estimation by regression over views

Gee & Mayol-Cuevas, BMVC 2012

● Operation in low-texture environments

● Industrial settings, parts and tools

● Frequent occlusions and moving objects

● Recovery during continuous motion

● User not aware of tracking failures

● Minimise computational load


RGB-D SLAM baseline

Tracking:

•~30 Hz

•Fails for fast/erratic motion

SURF relocalisation:

•~4 Hz

•Camera moves before relocalisation is finished

Long gaps in trajectories!

General regression of camera pose x for input image I0 over set of m synthetic views Ij and their poses xj, for j = 1 … m, with kernel function K and bandwidth h:

We use a Gaussian kernel:

where Med(...) is median over pixels, σc and σρ are vectors of std. dev. in intensity and depth per pixel over all sample views, and α is a smoothing factor.


../../BMVC12/Gee/KitchenVideo/kitchen.avi

Fast segmentation via constant relocalisation

Current performance:

•> 30 Hz relocalisation and segmentation

•Option to run further pose refinement if required

Future modifications:

•Improve robustness to occlusions

•Improve sampling method

Using images of 20 x15 pixels and only regression without further pose optimisation

Constant relocalisation:

With an RGBD camera and SLAM

Damen,Gee, Calway, Mayol-Cuevas IROS Workshop 2011

Novelty areas highlighted

VFH

Rusu et al. IROS 2010


Results


On 3 manipulation/assembly tasks

Modelling Workflows: combining camera

localisation and object detection

Bristol: D. Damen, A. Gee, A. Calway & W. Mayol-Cuevas

../../COGNITO/ProjectVideos/ObjectDetectionTracking_Bristol_Feb2012.f4v

Summary

• Object detection via tractable edge

configuration extraction. On-line training

and amenable to mobile hardware. • Code at: http://www.cs.bris.ac.uk/~damen/MultiObjDetector.htm

• Fast 6D pose estimation from tiny images

and a regression framework.

real time visual methods for shape- based object detection ... · real time visual methods for...

Documents