real time visual methods for shape- based object detection ... · real time visual methods for...
TRANSCRIPT
Real time visual methods for shape-based object detection and RGB-D
camera relocalisation
Walterio Mayol-Cuevas
Computer Science Department, University of Bristol
Qualcomm Seminar Series @ TU Vienna, February 2013
Work with
• Andrew Calway
• Andrew Davison
• Andrew Gee
• Denis Chekhlov
• Pished Bunnun
• Dima Damen
[Chekhlov, Pupilli, Mayol and Calway, ISVC06/CVPR07]
•Uses SIFT-like descriptors (histogram of gradients) around
Harris corners. Get scale from SLAM = “predictive SIFT”.
[Chekhlov, Gee, Calway and Mayol ISMAR07]
SLAM with higher-level structure discovery for AR
see Andrew Gee’s PhD thesis
Rationale for the work in this
talk: Useful Mapping. • Maps needed for robots and other intelligent
systems need to be more than collections of
geometric features:
From “Beautiful Maps”
to Useful Maps
Object Recognition semantic inference
Activity Recognition what happens where?
E.g. for Mapping egocentric activities
Sundaram & Mayol-Cuevas ISVC 2010/ICRA 2012
Localisation via SLAM,
Learning action models via
graphs of HoGs
& Modelling manipulation workflows
Bristol: D. Damen, A. Gee, A. Calway & W. Mayol-Cuevas
EU FP7
Two related tasks:
• 3D Object detection
• Camera relocalisation
• Both can be seen as a search + pose
computation problem.
• They happen at “different scales”:
• Object detection is likely to be for multiple objects
under various poses and scales and under partial
occlusion.
• Relocalisation is likely to be for a set of places and
where some things have moved around.
Part I: Texture-less object
detection
Texture-less object detection
O Carmichael and M Hebert. BMVC2002. M Leordeanu etal CVPR2007
P Yarlagaddaet al ECCV 2010 S Fidler et al ECCV 2010
Texture-less object detection
S Hinterstoisser, et al. CVPR2010.
Dominant Orientation Templates:
Our motivation:
• For multiple known 3D objects
• Texture-minimal / Texture-less
• At multiple frames per second
• Scalable & invariance built-in
• And online training: in-situ operation
Detecting Texture-Minimal Objects
Tackling the tractability problem
On a “Simple” image like this,
the maximum number of
edge configurations can be
of the order of 10s of
thousands of millions of
possibilities. For a 5-edgelet
chain on an image with n
edgelets this is:
Key idea: use fixed paths. Instead of searching and training for the object in
any possible way, fixing paths does this in a pre-determined manner. For this
image it means 8 orders of magnitude less options instead ~1.5K only.
What is a fixed path?
Damen, et al. BMVC 2012
A fixed path:
Extracts many different
descriptors i.e. lengths
and edgelet’s relative
orientations
A single, fixed path
still covers well
different objects
Damen, et al. BMVC 2012
How to Select the Paths?? • We performed this once:
• Randomly selected 100 angle tuples
• Test performance on an independent set of objects
• Test # of extracted constellations + ambiguity of descriptor
• Best 6 paths were selected
How to Select the Paths??
On-line Training
Damen, et al. BMVC 2012
Testing
Damen, et al. BMVC 2012
30 objects & tools
Damen, et al. BMVC 2012
Results – Recall vs Precision
Damen, et al. BMVC 2012
Results - Scalability
Damen, et al. BMVC 2012
Clutter handling
Damen, et al. BMVC 2012
30 objects & tools
Detection using the N900
Bunnun, Damen, Calway, Mayol-Cuevas, ISMAR 2012
Today @ Vienna
In-situ modelling, tracking and
detection on a mobile
In-situ modelling 6D tracking Detection
Bunnun, Damen, Calway, Mayol-Cuevas, ISMAR 2012
Results on mobile phone
About 0.8s per successful detection on images with about 150 edglets
Bunnun, Damen, Calway, Mayol-Cuevas, ISMAR 2012
Detection primed by eye-gaze
Code Released
• C++, tested on Ubuntu and works in ROS
• http://www.cs.bris.ac.uk/~damen/MultiObjDetector.htm
Part II: 6D relocalisation
Fast RGB-D relocalisation
Exploit workspace info:
•Camera trajectory constrained by task
•Known 3D model of workspace
Our approach:
•Sample low resolution synthetic views around known trajectory
•Pose estimation by regression over views
Gee & Mayol-Cuevas, BMVC 2012
● Operation in low-texture environments
● Industrial settings, parts and tools
● Frequent occlusions and moving objects
● Recovery during continuous motion
● User not aware of tracking failures
● Minimise computational load
Gee & Mayol-Cuevas, BMVC 2012
RGB-D SLAM baseline
Tracking:
•~30 Hz
•Fails for fast/erratic motion
SURF relocalisation:
•~4 Hz
•Camera moves before relocalisation is finished
Long gaps in trajectories!
General regression of camera pose x for input image I0 over set of m synthetic views Ij and their poses xj, for j = 1 … m, with kernel function K and bandwidth h:
We use a Gaussian kernel:
where Med(...) is median over pixels, σc and σρ are vectors of std. dev. in intensity and depth per pixel over all sample views, and α is a smoothing factor.
Gee & Mayol-Cuevas, BMVC 2012
Fast segmentation via constant relocalisation
Current performance:
•> 30 Hz relocalisation and segmentation
•Option to run further pose refinement if required
Future modifications:
•Improve robustness to occlusions
•Improve sampling method
Using images of 20 x15 pixels and only regression without further pose optimisation
Constant relocalisation:
With an RGBD camera and SLAM
Damen,Gee, Calway, Mayol-Cuevas IROS Workshop 2011
Novelty areas highlighted
VFH
Rusu et al. IROS 2010
Damen,Gee, Calway, Mayol-Cuevas IROS Workshop 2011
Results
Damen,Gee, Calway, Mayol-Cuevas IROS Workshop 2011
On 3 manipulation/assembly tasks
Modelling Workflows: combining camera
localisation and object detection
Bristol: D. Damen, A. Gee, A. Calway & W. Mayol-Cuevas
Summary
• Object detection via tractable edge
configuration extraction. On-line training
and amenable to mobile hardware. • Code at: http://www.cs.bris.ac.uk/~damen/MultiObjDetector.htm
• Fast 6D pose estimation from tiny images
and a regression framework.