object detection (d2l4 2017 upc deep learning for computer vision)
TRANSCRIPT
![Page 1: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/1.jpg)
[course site]
Object DetectionDay 2 Lecture 4
#DLUPC
Amaia [email protected]
PhD CandidateUniversitat Politècnica de Catalunya
![Page 2: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/2.jpg)
Object Detection
CAT, DOG, DUCK
The task of assigning a label and a bounding box to all objects in the image
2
![Page 3: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/3.jpg)
Object Detection: Datasets
3
20 categories6k training images
6k validation images10k test images
200 categories456k training images
60k validation + test images
80 categories200k training images60k val + test images
![Page 4: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/4.jpg)
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
4
![Page 5: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/5.jpg)
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
5
Object Detection as Classification
![Page 6: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/6.jpg)
Classes = [cat, dog, duck]
Cat ? YES
Dog ? NO
Duck? NO
6
Object Detection as Classification
![Page 7: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/7.jpg)
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
7
Object Detection as Classification
![Page 8: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/8.jpg)
Problem: Too many positions & scales to test
Solution: If your classifier is fast enough, go for it8
Object Detection as Classification
![Page 9: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/9.jpg)
Object Detection with ConvNets?
Convnets are computationally demanding. We can’t test all positions & scales !
Solution: Look at a tiny subset of positions. Choose them wisely :)9
![Page 10: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/10.jpg)
Region Proposals
● Find “blobby” image regions that are likely to contain objects● “Class-agnostic” object detector● Look for “blob-like” regions
Slide Credit: CS231n 10
![Page 11: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/11.jpg)
Region Proposals
Selective Search (SS) Multiscale Combinatorial Grouping (MCG)
[SS] Uijlings et al. Selective search for object recognition. IJCV 2013
[MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 11
![Page 12: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/12.jpg)
Object Detection with Convnets: R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
12
![Page 13: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/13.jpg)
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train network on proposals
2. Post-hoc training of SVMs & Box regressors on fc7 features
13
![Page 14: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/14.jpg)
R-CNN
14
We expect: We get:
![Page 15: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/15.jpg)
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train network on proposals
2. Post-hoc training of SVMs & Box regressors on fc7 features
3. Non Maximum Suppression + score threshold
15
![Page 16: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/16.jpg)
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
16
![Page 17: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/17.jpg)
R-CNN: Problems
1. Slow at test-time: need to run full forward pass of CNN for each region proposal
2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors
3. Complex multistage training pipeline
Slide Credit: CS231n 17
![Page 18: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/18.jpg)
Fast R-CNN
Girshick Fast R-CNN. ICCV 2015
Solution: Share computation of convolutional layers between region proposals for an image
R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal
18
![Page 19: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/19.jpg)
Fast R-CNN: Sharing features
Hi-res input image:3 x 800 x 600
with region proposal
Convolution and Pooling
Hi-res conv features:C x H x W
with region proposal
Fully-connected layers
Max-pool within each grid cell
RoI conv features:C x h x w
for region proposal
Fully-connected layers expect low-res conv features:
C x h x w
Slide Credit: CS231n 19Girshick Fast R-CNN. ICCV 2015
![Page 20: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/20.jpg)
Fast R-CNN
Solution: Train it all at together E2E
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
20Girshick Fast R-CNN. ICCV 2015
![Page 21: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/21.jpg)
Fast R-CNN
Slide Credit: CS231n
R-CNN Fast R-CNN
Training Time: 84 hours 9.5 hours
(Speedup) 1x 8.8x
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
mAP (VOC 2007) 66.0 66.9
Using VGG-16 CNN on Pascal VOC 2007 dataset
Faster!
FASTER!
Better!
21
![Page 22: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/22.jpg)
Fast R-CNN: Problem
Slide Credit: CS231n
R-CNN Fast R-CNN
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
Test time per imagewith Selective Search 50 seconds 2 seconds
(Speedup) 1x 25x
Test-time speeds don’t include region proposals
22
![Page 23: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/23.jpg)
Faster R-CNN
Con
v la
yers Region Proposal Network
FC6
Class probabilitiesFC7
FC8
RPN Proposals
RoI Pooling
Conv5_3
RPN Proposals
23Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Learn proposals end-to-end sharing parameters with the classification network
![Page 24: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/24.jpg)
Faster R-CNN
Con
v la
yers Region Proposal Network
FC6
Class probabilitiesFC7
FC8
RPN Proposals
RoI Pooling
Conv5_3
RPN Proposals
Fast R-CNN
24Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Learn proposals end-to-end sharing parameters with the classification network
![Page 25: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/25.jpg)
Region Proposal Network
Objectness scores(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scales and 3 aspect ratios)
25Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
![Page 26: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/26.jpg)
Faster R-CNN: Training
Con
v la
yers Region Proposal Network
FC6
Class probabilitiesFC7
FC8
RPN Proposals
RoI Pooling
Conv5_3
RPN Proposals
26Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
RoI Pooling is not differentiable w.r.t box coordinates. Solutions:● Alternate training● Ignore gradient of classification branch w.r.t proposal coordinates● Make pooling function differentiable (spoiler D3L6)
![Page 27: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/27.jpg)
Faster R-CNN
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
R-CNN Fast R-CNN Faster R-CNN
Test time per image(with proposals)
50 seconds 2 seconds 0.2 seconds
(Speedup) 1x 25x 250x
mAP (VOC 2007) 66.0 66.9 66.9
Slide Credit: CS231n 27
![Page 28: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/28.jpg)
Faster R-CNN
28
● Faster R-CNN is the basis of the winners of COCO and ILSVRC 2015&2016 object detection competitions.
He et al. Deep residual learning for image recognition. CVPR 2016
![Page 29: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/29.jpg)
YOLO: You Only Look Once
29Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016
Proposal-free object detection pipeline
![Page 30: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/30.jpg)
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 30
![Page 31: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/31.jpg)
YOLO: You Only Look Once
31
Each cell predicts:
- For each bounding box:- 4 coordinates (x, y, w, h)- 1 confidence value
- Some number of class probabilities
For Pascal VOC:
- 7x7 grid- 2 bounding boxes / cell- 20 classes
7 x 7 x (2 x 5 + 20) = 7 x 7 x 30 tensor = 1470 outputs
![Page 32: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/32.jpg)
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 32
Dog
Bicycle Car
Dining Table
Predict class probability for each cell
![Page 33: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/33.jpg)
YOLO: You Only Look Once
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 33
+ NMS+ Score threshold
![Page 34: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/34.jpg)
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, ECCV 2016 34
Same idea as YOLO, + several predictors at different stages in the network
![Page 35: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/35.jpg)
YOLOv2
35Redmon & Farhadi. YOLO900: Better, Faster, Stronger. CVPR 2017
![Page 36: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/36.jpg)
YOLOv2
36
Results on Pascal VOC 2007
![Page 37: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/37.jpg)
YOLOv2
37
Results on COCO test-dev 2015
![Page 38: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/38.jpg)
Summary
38
Proposal-based methods● R-CNN● Fast R-CNN● Faster R-CNN● SPPnet ● R-FCNProposal-free methods● YOLO, YOLOv2● SSD
![Page 39: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/39.jpg)
Resources
39
● Official implementations:○ Faster R-CNN [caffe]○ Yolov2 [darknet]○ SSD [caffe]○ R-FCN [caffe][MxNet]
● Unofficial ports to other frameworks are likely to exist… eg type “yolo tensorflow” in your browser and pick the one you like best.
● Or… use the newly released Object detection API by Google: SSD, R-FCN & Faster R-CNN (code & pretrained models in tensorflow)
Object detection tutorials (project ideas maybe?):● Toy object detection (squares, circles, etc.) (keras)● Object detection (pets dataset) (tensorflow)
![Page 40: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/40.jpg)
Questions?
![Page 41: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/41.jpg)
YOLO: Training
41Slide credit: YOLO Presentation @ CVPR 2016
For training, each ground truth bounding box is matched into the right cell
![Page 42: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/42.jpg)
YOLO: Training
42Slide credit: YOLO Presentation @ CVPR 2016
For training, each ground truth bounding box is matched into the right cell
![Page 43: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/43.jpg)
YOLO: Training
43Slide credit: YOLO Presentation @ CVPR 2016
Optimize class prediction in that cell:dog: 1, cat: 0, bike: 0, ...
![Page 44: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/44.jpg)
YOLO: Training
44Slide credit: YOLO Presentation @ CVPR 2016
Predicted boxes for this cell
![Page 45: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/45.jpg)
YOLO: Training
45Slide credit: YOLO Presentation @ CVPR 2016
Find the best one wrt ground truth bounding box, optimize it (i.e. adjust its coordinates to be closer to the ground truth’s coordinates)
![Page 46: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/46.jpg)
YOLO: Training
46Slide credit: YOLO Presentation @ CVPR 2016
Increase matched box’s confidence, decrease non-matched boxes confidence
![Page 47: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/47.jpg)
YOLO: Training
47Slide credit: YOLO Presentation @ CVPR 2016
Increase matched box’s confidence, decrease non-matched boxes confidence
![Page 48: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/48.jpg)
YOLO: Training
48Slide credit: YOLO Presentation @ CVPR 2016
For cells with no ground truth detections, confidences of all predicted boxes are decreased
![Page 49: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/49.jpg)
YOLO: Training
49Slide credit: YOLO Presentation @ CVPR 2016
For cells with no ground truth detections:● Confidences of all predicted
boxes are decreased ● Class probabilities are not
adjusted
![Page 50: Object Detection (D2L4 2017 UPC Deep Learning for Computer Vision)](https://reader030.vdocuments.us/reader030/viewer/2022021507/5a650ce67f8b9aa2548b64d7/html5/thumbnails/50.jpg)
YOLO: Training, formally
50Slide credit: YOLO Presentation @ CVPR 2016
Bounding box coordinate regression
Bounding boxscore prediction
Classscore prediction
= 1 if box j and cell i are matched together, 0 otherwise
= 1 if box j and cell i are NOT matched together
= 1 if cell i has an object present