recognition pipeline and object detection scalability · recognition pipeline motivation easy to...
TRANSCRIPT
![Page 1: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/1.jpg)
Recognition Pipeline andObject Detection Scalability
Marius MujaUniversity of British Columbia
Summer 2010 Internship Presentation
![Page 2: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/2.jpg)
Recognition Pipeline
● Motivation● Easy to use vision algorithms without actually writing
vision code● Easy to write vision algorithms without much
knowledge of the rest of the system● “Plug and play”, swappable vision algorithms● Each algorithm is a “building block” that consumes
input(s), produces some output(s) and is configured by a set of parameters– Implemented as nodelets for efficiency reasons
![Page 3: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/3.jpg)
Example
Detector 1 Detector 2
Detector 3
AttentionOperator
DetectionMerger
PoseEstimator
Model Fitter
StereoCamera
GraspingPipeline
![Page 4: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/4.jpg)
Recognition Pipeline Components
Detector
detections
rois/masks
poses
imagepoint_cloud
rois/masks
detections
Attention Operator
masks
rois
image
point_cloud
Pose estimator poses
point_cloud
image
detections
poses
![Page 5: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/5.jpg)
Example
● Adding a new object detector to the recognition pipeline
![Page 6: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/6.jpg)
Recognition Pipeline Design
● Components● ROS-independent vision algorithms● ROS-wrappers for those algorithms (nodelets)
● Features● Easy to include additional algorithms as plugins (dynamically
loadable/unloadable)● Build-in model persistence (currently using postgresql and sqlite3
databases or the file system)● TrainerServer – framework and GUI for training new models
● Located in the 'recognition_pipeline' package
![Page 7: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/7.jpg)
Binarized Gradient Grids (BiGG)
● Goal: fast and scalable object detection for rigid, non-articulated objects
● A template based object detection method
![Page 8: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/8.jpg)
BiGG – Algorithm Outline
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
TemplateDatabase
BiGG
Input image
Detections
![Page 9: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/9.jpg)
BiGG – Algorithm Outline
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
Input image
Detections
● Using gradient information instead of pixel values to be more robust to illumination changes
Magnitude
Orientation
![Page 10: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/10.jpg)
BiGG – Algorithm Outline
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
Input image
Detections
● Discretize each gradient orientation into 8 bins
● Use only orientation information to be more robust to contrast changes
● Makes the algorithm robust to slight rotation changes
● Use bit operations for fast matching● Ignore polarity of orientation
![Page 11: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/11.jpg)
BiGG – Algorithm Outline
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
Input image
Detections
● Only use pixels with the magnitude above a certain threshold to be robust to noise
Magnitude
Orientation
Discretized Orientation
![Page 12: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/12.jpg)
BiGG – Algorithm Outline
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
Input image
Detections
● Noisy gradient are filtered by non-maxima suppression on 3x3 cells
● Discard singleton values (shot noise)
● A summary image is computed by down-sampling the discretized gradient image
● Split the image in nxn cells (n=8)● OR the gradients in each cell● Speeds up the matching and makes
it more robust to small shifts
![Page 13: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/13.jpg)
BiGG – Algorithm Outline
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
Input image
Detections
● Slide a template over the image and compute response at each location
● The score is computed by an AND operation between the template and the image region
● If above a threshold is considered a detection
● Apply non-maxima suppression to eliminate overlapping detections
![Page 14: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/14.jpg)
BiGG Limitations
● Sliding window approach● Large image search space
● Not scalable● Number of templates grows linearly with the number
of objects● Large template search space (for a large number of
templates)
![Page 15: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/15.jpg)
Scaling BiGG
● Binarized Gradient Grids Pyramid● Use a pyramid of binarized gradient images instead of a
single down-sampled gradient image● Index the templates in a tree structure that mirrors the
image pyramid– Small resolution templates on the root nodes, high resolution
templates on leaf nodes● Reduces both the image search space and the template
search space
![Page 16: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/16.jpg)
BiGGPy
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Summary Image
Sliding Window Matching
Input image
Detections
Compute Gradient Image
Discretize Gradients
Filter Noisy Gradients
Compute Image Pyramid
Pyramid Matching
Input image
Detections
![Page 17: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/17.jpg)
Computing Image Pyramid
● Each level of the pyramid is computed by OR-ing together 2x2 cells from the lower level
● Templates are indexed in a tree that mirrors the structure of the image pyramid
. . .
...
...
![Page 18: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/18.jpg)
Pyramid Matching
● Start at top level with sliding window matching● Fast due to low resolution of the gradient image and
few templates of low resolution● For each of the detections on the top level search the
next level in that neighborhood using the children templates of the template that matched at the top level
● Repeat previous step for all the levels
● Return detections on the lowest level
![Page 19: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/19.jpg)
Image Search Space Reduction
![Page 20: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/20.jpg)
Template Space Search Reduction
...
...
...
...
...
...
...
Detection candidate
Detection candidate
![Page 21: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/21.jpg)
Demo
Demo...
![Page 22: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/22.jpg)
Other work
● Deformable Part Models object detector (P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan)
● One of the top performers in the VOC challenge● Wrapped it to work inside the recognition pipeline
('dpm_detector' package)
● Scales linearly with the number of objects
● High training time
![Page 23: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/23.jpg)
Future Work
● Integrate BiGGPy with VFH (Viewpoint Cluster Histogram) Classifier (in progress)● VFH would filter out false positives and estimate
pose of the object
● Do a quantitative evaluation on a large object dataset● Confirm the sub-linear scalability with respect to
number of objects
● Use the compute cluster to scale to a very large number of objects
![Page 24: Recognition Pipeline and Object Detection Scalability · Recognition Pipeline Motivation Easy to use vision algorithms without actually writing vision code Easy to write vision algorithms](https://reader036.vdocuments.us/reader036/viewer/2022063009/5fc15d97f612ff0b646da692/html5/thumbnails/24.jpg)
Thank you!