practical modeling and recognition using r g b - d c ameras
DESCRIPTION
Practical Modeling and Recognition using R G B - D C ameras. Xiaofeng Ren, Dieter Fox Intel Labs, University of Washington. Joint work with Liefeng Bo, Kevin Lai, Peter Henry, Evan Herbst, Mike Krainin, Hao Du and others @ University of Washington. June 27, 2011. - PowerPoint PPT PresentationTRANSCRIPT
Practical Modeling and Recognition using RGB-D Cameras
Joint work with Liefeng Bo, Kevin Lai, Peter Henry, Evan Herbst, Mike Krainin, Hao Du and others @ University of Washington
Xiaofeng Ren, Dieter FoxIntel Labs, University of Washington
June 27, 2011
3
At RGB-D 2010 Workshop:
3D modeling of indoor environments
RGBD-ICP matching + Loop closure; Flythrough visualization
3D modeling of everyday objects
Robot in-hand modeling through real-time registration and modeling
Robust recognition of everyday objects
Preliminary object dataset captured with RGB-D
Preliminary results on sparse distance learning
04/19/2023
4
RGB-D Perception @ UW and Intel
3D modeling of objects & environmentsIndoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]
Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]
Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ‘11]
Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10]
Interactive 3D Visualization: [Cheng, Ren; ’11]
Robust recognition of everyday objectsEgocentric recognition: [Ren, Gu; CVPR ’10]
Joint object-pose recognition: [Gu, Ren; ECCV ’10]
Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10, IROS ’11]
Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11]
RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]
Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper)
Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]
04/19/2023
5
RGB-D Perception @ UW and Intel
3D modeling of objects & environmentsIndoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]
Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]
Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11]
Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10]
Interactive 3D Visualization: [Cheng, Ren; ’11]
Robust recognition of everyday objectsEgocentric recognition: [Ren, Gu; CVPR ’10]
Joint object-pose recognition: [Gu, Ren; ECCV ’10]
Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10]
Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11]
RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]
Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper)
Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]
04/19/2023
Discovering and Learning Objects
04/19/202311
• (Robot) capturing scenes in RGB-D over extended period of time• 3D scene reconstruction for efficient representation• Proper sensor models for both color and depth• Pairwise scene differencing with sensor models and MRF clean-up
[Herbst-Henry-Ren-Fox; ICRA 2011]
Discovering and Learning Objects
04/19/202312 [Herbst-Ren-Fox; IROS 2011]
• Handling changed detections in multiple visits with multi-label MRF• Matching potential objects by movements and appearance
• ICP for shape matching• Color image recognition with kernel descriptors
• Spectral clustering for object discovery
16
RGB-D Perception @ UW and Intel
3D modeling of objects & environmentsIndoor Modeling: [Henry, Krainin, Herbst, Ren, Fox; ISER ’10]
Interactive Modeling: [Hao, Henry, Ren, Fox, Seitz; Ubicomp ’11]
Dynamic Scene Modeling: [Herbst, Ren, Fox; ICRA ’11, IROS ’11]
Object Manipulation: [Krainin, Henry, Ren, Fox; IJRR ’10]
Interactive 3D Visualization: [Cheng, Ren; ’11]
Robust recognition of everyday objectsEgocentric recognition: [Ren, Gu; CVPR ’10]
Joint object-pose recognition: [Gu, Ren; ECCV ’10]
Kernel Descriptors: [Bo, Ren, Fox; NIPS ’10]
Hierarchical Kernel Descriptors: [Bo, Lai, Ren, Fox; CVPR ’11]
RGB-D Benchmark: [Lai, Bo, Ren, Fox; ICRA ’11]
Sparse distance learning: [Lai, Bo, Ren, Fox; ICRA ’11] (best vision paper)
Scalable and hierarchical recognition: [Lai, Bo, Ren, Fox; AAAI ’11]
04/19/2023
17
RGB-D Object Dataset
04/19/2023
300 objects from 51 categories, 250,000 RGB-D views
Cluttered scenes
[Lai-Bo-Ren-Fox; ICRA 2011]
http://www.cs.washington.edu/rgbd-dataset/(search “rgbd”+”dataset”)
Classifier Shape (Depth) Vision (RGB) RGB-D
Linear SVM 51.71.8 72.73.2 80.52.9
Kernel SVM 63.52.3 72.93.2 83.03.7
RandomForest 65.52.4 73.13.7 78.54.1
Kernel Desc.+Linear SVM 75.72.2 76.12.6 84.12.2
18
Benchmarking RGB-D Recognition
04/19/2023
Classifier Shape (Depth) Vision (RGB) RGB-D
Linear SVM 29.40.5 90.40.5 89.60.5
Kernel SVM 50.10.9 90.80.5 90.40.6
RandomForest 51.61.1 89.60.7 90.20.3
Category-Level Recognition (51 categories)
Instance-Level Recognition (303 instances)
[Lai-Bo-Ren-Fox; ICRA 2011]
19
RGB-D Object Recognition
04/19/2023
Image Patch features Image features
Recognition
Your favorite model
Bag-of-WordsSparse Coding (LLC,LCC)
Spatial Pyramid Matching (SPM)Efficient Match Kernel (EMK)
Feed-forward Networks
SIFT (or HOG)?
20
Kernel Descriptors: Generalizing SIFT
04/19/2023
Pu Qv
pvuovu vukkmmQPK ),(),(),(grad Gradient Match Kernel
gradient orientation
image patch
pixel coordinates
kernels
normalized gradient magnitude
Includes SIFT as a special caseAvoids any “binning” issues in histogram features
Linear kernel on SIFT descriptors
= a product of two histograms
= a product summed over all pairs of pixels
[Bo-Ren-Fox; NIPS 2010]
21
Kernel Descriptors: Image Recognition
04/19/2023
Scene-15
KDES: 86.7% SIFT: 82.2%
Caltech-101
KDES: 76.4% CDBN[2]: 65.5% SPM[1]: 64.4% LCC[4]: 73.4%
CIFAR10 KDES: 76.0% LCC[4]: 74.5% mcRBM-DBN[3]: 71.0% TCNN[5]: 73.1%
[1] Lazebnik, Schmid, Ponce, CVPR ‘06. [2] Lee, Grosse, Ranganath, Ng, ICML ‘09.[3] Ranzato & Hinton, CVPR ‘10. [4] Yu & Zhang, ICML ‘10.[5] Le, Ngiam, Chen, Chia, Koh & Ng, NIPS ‘10.
Low-dimensional approximations of match kernels Explicitly compute descriptors/features from patches Easily generalize gradient features to color, binary shape, etc Outperform SIFT and sophisticated feature learning techniques
[Bo-Ren-Fox; NIPS 2010]
Classifier Shape (Depth) Vision (RGB) RGB-D
Linear SVM 51.71.8 72.73.2 80.52.9
Kernel SVM 63.52.3 72.93.2 83.03.7
RandomForest 65.52.4 73.13.7 78.54.1
Kernel Desc.+Linear SVM 75.72.2 76.12.6 84.12.2
22
Kernel Descriptors: RGB-D Recognition
04/19/2023
Classifier Shape (Depth) Vision (RGB) RGB-D
Linear SVM 29.40.5 90.40.5 89.60.5
Kernel SVM 50.10.9 90.80.5 90.40.6
RandomForest 51.61.1 89.60.7 90.20.3
Category-Level Recognition (51 categories)
Instance-Level Recognition (303 instances)
[Bo-Lai-Ren-Fox; CVPR 2011; IROS 2011]
Toward Practical Recognition
04/19/202323
• A mug?• Kevin’s mug?• A mug facing right?• A mug with orientation (90,15,0)
… …
Scalable and Hierarchical Recognition
04/19/202324 [Lai-Bo-Ren-Fox; AAAI 2011]
8 discrete views
continuous angles
Joint Recognition with Object-Pose Tree
04/19/202325
• Tree structure enables efficient joint recognition• Object-Pose tree outperforms nearest neighbor and 1vsA baselines• Joint tree-based learning outperforms separate learning• Promising pose estimation results on generic objects
• Natural tree structure of category-instance-pose works really well
[Lai-Bo-Ren-Fox; AAAI 2011]
RGB-D Dataset: 300 objects, 51 categories, 250,000 color-depth pairs
26
Application: Interactive LEGO
04/19/2023
RGB-D used for object recognition and hand tracking
[Ziola-Harrison-Powledge-Lai-Bo-Ren-Fox]
27
Application: Chess Playing Robot
04/19/2023 [Matuszek-Mayton-Aimi-Bo-Deisenroth-Chu-Kung-LeGrand-Smith-Fox]
28
RGB-D Perception: Summary
RGB-D cameras provide synchronized color and depth, making visual perception both robust and efficient.
RGB-D mapping generates detailed 3D maps at near real-time and enables on-the-fly user interaction and feedback.
Kernel descriptors provide a principled way to extract rich features from pixel attributes, outperforming SIFT and leading to robust RGB-D recognition.
Robust RGB-D recognition and modeling enable interesting scenarios for object-aware interactions and applications.
04/19/2023
29
RGB-D Perception: The Future?
Will RGB-D have a deep impact on vision applications?
YES! It’s already happening, faster than we can track.
Will RGB-D start a revolution in vision applications?
NO. We still need to solve recognition, segmentation, tracking, scene understanding, etc. etc.
YES! RGB-D helps address two BIG issues in computer vision: loss of 3D from projection; lighting conditions.
RGB-D helps “abstract away” many low-level problems.
Is RGB-D the future for smart vision-based systems?
Why not? At $50 today and $10 tomorrow.
04/19/2023