regionlets for generic object detection xiaoyu wang, ming yang, shenghuo zhu, and yuanqing lin nec...
TRANSCRIPT
![Page 1: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/1.jpg)
Regionlets for Generic Object
DetectionXiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin
NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper
Presenter: Ming-Ming Cheng @ VGG reading group, 6/2/2014.
![Page 2: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/2.jpg)
Generic object detection
![Page 3: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/3.jpg)
Review: rigid vs. deformable
• Classical cascaded AdaBoost
• Deformable part models (DPM)
• Spatial pyramid matching (SPM) of BoW
Rigid objects
Deformable objects
Tackles local variations, hardly handle deformations
High deformation tolerance results in imprecise localization or false positives for rigid objects
Can we mode both rigid and deformable objects in a unified framework?
![Page 4: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/4.jpg)
Review: parameter selection• Deformable Part-based Model (DPM)• Specify the number of deformable parts
• Spatial Pyramid Matching• Specify the number of pyramids to build
Do we have to pre-define model parameters to handle different degrees of deformation?
![Page 5: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/5.jpg)
Review: multi scales/viewpoints
• DPM• Resize an image to detect objects at a fixed scale• Multiple models, each deals with one viewpoint
• Spatial Pyramid Matching• No need to resize the image• One model, a codebook is used to encode features
Can we learn a model that can be easily adapted to arbitrary scales and viewpoints?
![Page 6: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/6.jpg)
Review: sliding window vs. selective search
• Classical: sliding window• Check hundreds of thousands (if not millions) of sliding windows.• Resize image and detect objects at fixed scale• Needs to use very efficient classifiers
• Recent: selective search• Thousands of object-independent candidate regions
• [12PAMI] B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image window.
• [13IJCV] K. E. A. Van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders, Selective search for object recognition
• [13PAMI] Ian Endres, and Derek Hoiem, Category-independent object proposals with diverse ranking.
• [11ICCV] E. Rahtu, J. Kannala, and M. Blaschko. Learning a category independent object detection cascade.
• [11CVPR] Z. Zhang, J. Warrell, and P. H. S. Torr. Proposal generation for object detection using cascaded ranking SVMs. InCVPR, 2011
• Allowing strong classifier, needs to work with arbitrary window size
![Page 7: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/7.jpg)
Motivation• A flexible and general object-level representation with• Hassle free deformation handling• Arbitrary scales and aspect ratio handling
![Page 8: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/8.jpg)
Detection framework• Generate candidate detection bounding boxes
• Boosting classifier cascades
![Page 9: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/9.jpg)
Regionlet: Definition• Region (): feature extraction region• Regionlet (): A sub-region in a feature extraction area
whose position/resolution are relative and normalized to a detection window
Bounding boxRegionRegionlets
Why regionlets?Some part of the region may not be informative or even distractive.
Why ‘regionlets’?
![Page 10: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/10.jpg)
Regionlet: definition (cont.)• Relative normalized position
![Page 11: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/11.jpg)
Regionlet: Feature extraction
![Page 12: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/12.jpg)
Regionlet• Constructing the regions/regionlets pool• Small region, fewer regionlets -> fine spatial layout• Large region, more regionlets -> robust to deformation• Use cascaded boosting learning to select the most
discriminative regionlets from a largely over-complete pool
![Page 13: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/13.jpg)
Regionlet: Training• Construct regionlets pool• Enumerating all possible
regions is impractical and not necessary.• Propose a set of regionlets
with random positions (up to 5) inside each region with identical size.
![Page 14: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/14.jpg)
Regionlet: Training• Constructing the regions/regionlets poolLearning
Boosting cascades• 16K region/regionlets candidates for each cascade• Learning of each cascade stops when the error rate is
achieved (1% for positive, 37.5% for negative)• Last cascade stops after collecting 5000 weak classifiers• Result in 4-7 cascades• 2-3 hours to finish training one category on a 8-core machine
![Page 15: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/15.jpg)
Regionlets: Testing• No image resizing• Any scale, any aspect ratio• Adapt the model size to the same size as the object
candidate bounding box
![Page 16: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/16.jpg)
Experiments• Datasets• PASCAL VOC 2007, 2010
• 20 object categories• ImageNet Large Scale Object Detection Dataset
• 200 object categories
• Investigated Features• HOG• LBP• Covariance• Deep Convolutional Neural Network (DCNN) feature (only for
the ImageNet challenge)
![Page 17: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/17.jpg)
Experiments: PASCAL VOC
• Baselines• DPM: excellent at detecting rigid objects. E.g. car, bus, etc.• SS_SPM: for objects with significant global deformations.
• E.g. cat, cow sheep, etc.• SPM outperforms DPM in airplane and TV. Due to very diverse
viewpoints or rich sub-categories.• Objectness (use old version of DPM): selective search itself
dose not benefit detection tool much (0.6% improvement) in terms of accuracy.
![Page 18: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/18.jpg)
Experiments: PASCAL VOC
• Baselines: • DPM• SPM• Objectness
• Regionlets• Won 16 out of 20 categories• ‘To our best knowledge, is the best performance in VOC07’• Multiple regionlets consistently outperforms single regionlets
![Page 19: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/19.jpg)
Experiments: PASCAL VOC
![Page 20: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/20.jpg)
Experiments: ImageNet
![Page 21: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/21.jpg)
Running speed• 0.2 second per image using a single core if candidate
bounding boxes are given, real time(>30 frames per second) using 8 cores • 2 seconds per image to generate candidate bounding
boxes • 2-3 hours to finish training one category on a 8-core
machine
![Page 22: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/22.jpg)
Conclusions• A new object representation for object detection• Non-local max-pooling of regionlets• Relative normalized locations of regionlets • Flexibility to incorporate various types of features
• A principled data-driven detection framework, effective in handling deformation, multiple scales, multiple viewpoints• Superior performance with a fast running speed
![Page 23: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/23.jpg)
Latest results• Regionlets with Deep CNN feature (outside data)
![Page 24: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/24.jpg)
Some really exciting parts during the reading group will be made public valuable after CVPR 2014 results
announced, together with source code!
![Page 25: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/25.jpg)
My thoughts towards a real-time system
• Current bottleneck is object proposal generation• State of the art methods needs 2+ seconds to process one
image [IJCV13, PAMI12, PAMI13]
• Our recent results
Training on VOC: 20sTesting speed: 300fps
Generic ability
![Page 26: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/26.jpg)
My thoughts towards a real-time system
![Page 27: Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper](https://reader036.vdocuments.us/reader036/viewer/2022081513/551aa30055034656628b474c/html5/thumbnails/27.jpg)
My thoughts towards a real-time system