introduction to object detection - github pages...2019/04/03 · one stage detector: yolo...
TRANSCRIPT
![Page 2: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/2.jpg)
Visual Recognition
A fundamental task in computer vision
• Classification
• Object Detection
• Semantic Segmentation
• Instance Segmentation
• Key point Detection
• VQA
…
![Page 3: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/3.jpg)
Category-level Recognition
Category-level Recognition Instance-level Recognition
![Page 4: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/4.jpg)
Representation
• Bounding-box• Face Detection, Human Detection, Vehicle Detection, Text Detection,
general Object Detection• Point
• Semantic segmentation (will be discussed in next week)• Keypoint
• Face landmark• Human Keypoint
![Page 5: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/5.jpg)
Outline
• Detection• Human Keypoint• Conclusion
![Page 6: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/6.jpg)
Outline
• Detection• Human Keypoint• Conclusion
![Page 7: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/7.jpg)
Detection - Evaluation Criteria
Average Precision (AP) and mAP
Figures are from wikipedia
![Page 8: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/8.jpg)
Detection - Evaluation Criteria
mmAP
Figures are from http://cocodataset.org
![Page 9: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/9.jpg)
How to perform a detection?
• Sliding window: enumerate all the windows (up to millions of windows)• VJ detector: cascade chain
• Fully Convolutional network• shared computation
Robust Real-time Object Detection; Viola, Jones; IJCV 2001
http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
![Page 10: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/10.jpg)
General Detection Before Deep Learning
• Feature + classifier• Feature
• Haar Feature• HOG (Histogram of Gradient)• LBP (Local Binary Pattern)• ACF (Aggregated Channel Feature)• …
• Classifier• SVM• Bootsing• Random Forest
![Page 11: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/11.jpg)
Traditional Hand-crafted Feature: HoG
![Page 12: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/12.jpg)
Traditional Hand-crafted Feature: HoG
![Page 13: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/13.jpg)
General Detection Before Deep Learning
Traditional Methods
• Pros• Efficient to compute (e.g., HAAR, ACF) on CPU• Easy to debug, analyze the bad cases• reasonable performance on limited training data
• Cons• Limited performance on large dataset• Hard to be accelerated by GPU
![Page 14: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/14.jpg)
Deep Learning for Object Detection
Based on the whether following the “proposal and refine”
• One Stage• Example: Densebox, YOLO (YOLO v2), SSD, Retina Net• Keyword: Anchor, Divide and conquer, loss sampling
• Two Stage• Example: RCNN (Fast RCNN, Faster RCNN), RFCN, FPN, MaskRCNN• Keyword: speed, performance
![Page 15: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/15.jpg)
A bit of History
ImageFeature
Extractor
classification
localization
(bbox)
One stage detector
Densebox (2015) UnitBox (2016) EAST (2017)
YOLO (2015) Anchor Free
Anchor importedYOLOv2 (2016)
SSD (2015)
RON(2017)
RetinaNet(2017)
DSSD (2017)
two stages detector
ImageFeature
Extractor
classification
localization
(bbox)
Proposal
classification
localization
(bbox)
Refine
RCNN (2014) Fast RCNN(2015)Faster RCNN (2015)
RFCN (2016)
MultiBox(2014)
RFCN++ (2017)
FPN (2017)
Mask RCNN (2017)
OverFeat(2013)
![Page 16: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/16.jpg)
One Stage Detector: Densebox
DenseBox: Unifying Landmark Localization with End to End Object Detection, Huang etc, 2015
https://arxiv.org/abs/1509.04874
![Page 17: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/17.jpg)
One Stage Detector: Densebox
• No Anchor: GT Assignment• A sub-circle in the GT is labeled as positive
• fail when two GT highly overlaps• the size of the sub-circle matters• more attention (loss) will be placed to large faces
• Loss sampling• All pos/negative positions will be used to compute the cls loss
![Page 18: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/18.jpg)
One Stage Detector: Densebox
Problems
• L2 loss is not robust to scale variation (UnitBox)• learnt features are not robust
• GT assignment issue (SSD)• Fail to handle the crowd case
• relatively large localization error (Two stages detector)• more false positive (FP) (Two stages detector)
• does not obviously kill the fp
![Page 19: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/19.jpg)
One Stage Detector: Densebox -> UnitBox
UnitBox: An Advanced Object Detection Network, Yu etc, 2016
http://cn.arxiv.org/pdf/1608.01471.pdf
![Page 20: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/20.jpg)
One Stage Detector: Densebox -> UnitBox->EAST
EAST: An Efficient and Accurate Scene Text Detector, Zhou etc, CVPR 2017
https://arxiv.org/abs/1704.03155
![Page 21: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/21.jpg)
One Stage Detector: YOLO
You Only Look Once: Unified, Real-Time Object Detection, Redmon etc, CVPR 2016
https://arxiv.org/abs/1506.02640
![Page 22: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/22.jpg)
One Stage Detector: YOLO
You Only Look Once: Unified, Real-Time Object Detection, Redmon etc, CVPR 2016
https://arxiv.org/abs/1506.02640
![Page 23: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/23.jpg)
One Stage Detector: YOLO
• No Anchor • GT assignment is based on the cells (7x7)
• Loss sampling• all pos/neg predictions are evaluated (but more sparse than densebox)
You Only Look Once: Unified, Real-Time Object Detection, Redmon etc, CVPR 2016
https://arxiv.org/abs/1506.02640
![Page 24: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/24.jpg)
One Stage Detector: YOLO
Discussion
• fc reshape (4096-> 7x7x30)• more context• but not fully convolutional
• One cell can output up to two boxes in one category • fail to work on the crowd case
• Fast speed• small imagenet base model• small input size (448x448)
You Only Look Once: Unified, Real-Time Object Detection, Redmon etc, CVPR 2016
https://arxiv.org/abs/1506.02640
![Page 25: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/25.jpg)
One Stage Detector: YOLO
Experiments on general detection
You Only Look Once: Unified, Real-Time Object Detection, Redmon etc, CVPR 2016
https://arxiv.org/abs/1506.02640
Method VOC 2007 test VOC 2012 test COCO time
YOLO 57.9/NA 52.7/63.4 NA fps: 45/155
![Page 26: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/26.jpg)
One Stage Detector: YOLO -> YOLOv2
YOLO9000: Better, Faster, Stronger Redmon etc, CVPR 2016
https://arxiv.org/abs/1612.08242
![Page 27: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/27.jpg)
One Stage Detector: YOLO -> YOLOv2
Experiments:
YOLO9000: Better, Faster, Stronger Redmon etc, CVPR 2016
https://arxiv.org/abs/1612.08242
Method VOC 2007 test VOC 2012 test COCO time
YOLO 52.7/63.4 57.9/NA NA fps: 45/155
YOLOv2 78.6 73.4 21.6 fps: 40
![Page 28: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/28.jpg)
One Stage Detector: YOLO -> YOLOv2
Video demo: https://pjreddie.com/darknet/yolo/
YOLO9000: Better, Faster, Stronger Redmon etc, CVPR 2016
https://arxiv.org/abs/1612.08242
![Page 29: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/29.jpg)
One Stage Detector: SSD
SSD: Single Shot MultiBox Detector, Liu etc
https://arxiv.org/pdf/1512.02325.pdf
![Page 30: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/30.jpg)
One Stage Detector: SSD
SSD: Single Shot MultiBox Detector, Liu etc, ECCV 2016
https://arxiv.org/pdf/1512.02325.pdf
![Page 31: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/31.jpg)
One Stage Detector: SSD• Anchor
• GT-anchor assignment• GT is predicted by one best matched (IOU) anchor or matched with an
anchor with IOU > 0.5• better recall
• dense or sparse anchor?• Divide and Conquer
• Different layers handle the objects with different scales• Assume small objects can be predicted in earlier layers (not very strong
semantics)• Loss sampling
• OHEM: negative positions are sampled (not balanced pos/neg ratio)• negative:pos is at most 3:1
SSD: Single Shot MultiBox Detector, Liu etc, ECCV 2016
https://arxiv.org/pdf/1512.02325.pdf
![Page 32: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/32.jpg)
One Stage Detector: SSD
Discussion:
• Assume small objects can be predicted in earlier layers (not very strong semantics) (DSSD, RON, RetinaNet)
• strong data augmentation• VGG model (Replace by resnet in DSSD)
• cannot be easily adapted to other models• a lot of hacks
• A long tail (Large computation)
SSD: Single Shot MultiBox Detector, Liu etc, ECCV 2016
https://arxiv.org/pdf/1512.02325.pdf
![Page 33: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/33.jpg)
One Stage Detector: SSD
Experiments
SSD: Single Shot MultiBox Detector, Liu etc, ECCV 2016
https://arxiv.org/pdf/1512.02325.pdf
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
![Page 34: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/34.jpg)
One Stage Detector: SSD -> DSSD
DSSD : Deconvolutional Single Shot Detector, Fu etc 2017,
https://arxiv.org/abs/1701.06659
![Page 35: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/35.jpg)
One Stage Detector: DSSD
Experiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
DSSD : Deconvolutional Single Shot Detector, Fu etc 2017,
https://arxiv.org/abs/1701.06659
![Page 36: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/36.jpg)
One Stage Detector: SSD -> RON
RON: Reverse Connection with Objectness Prior Networks for Object Detection, Kong etc, CVPR 2017
https://arxiv.org/pdf/1707.01691.pdf
![Page 37: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/37.jpg)
One Stage Detector: RON
• Anchor• Divide and conquer
• Reverse Connect (similar to FPN)• Loss Sampling
• Objectness prior• pos/neg unbalanced issue• split to 1) binary cls 2) multi-class cls
RON: Reverse Connection with Objectness Prior Networks for Object Detection, Kong etc, CVPR 2017
https://arxiv.org/pdf/1707.01691.pdf
![Page 38: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/38.jpg)
One Stage Detector: RON
Experiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RON: Reverse Connection with Objectness Prior Networks for Object Detection, Kong etc, CVPR 2017
https://arxiv.org/pdf/1707.01691.pdf
![Page 39: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/39.jpg)
One Stage Detector: SSD -> RetinaNet
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017
https://arxiv.org/pdf/1708.02002.pdf
![Page 40: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/40.jpg)
One Stage Detector: SSD -> RetinaNet
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017
https://arxiv.org/pdf/1708.02002.pdf
![Page 41: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/41.jpg)
One Stage Detector: RetinaNet
• Anchor• Divide and Conquer
• FPN• Loss Sampling
• Focal loss• pos/neg unbalanced issue• new setting (e.g., more anchor)
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017
https://arxiv.org/pdf/1708.02002.pdf
![Page 42: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/42.jpg)
One Stage Detector: RetinaNet
Experiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017
https://arxiv.org/pdf/1708.02002.pdf
![Page 43: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/43.jpg)
One Stage Detector: Summary
• Anchor• No anchor: YOLO, densebox/unitbox/east • Anchor: YOLOv2, SSD, DSSD, RON, RetinaNet
• Divide and conquer• SSD, DSSD, RON, RetinaNet
• loss sample• all sample: densebox• OHEM: SSD• focal loss: RetinaNet
![Page 44: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/44.jpg)
One Stage Detector: DiscussionAnchor (YOLO v2, SSD, RetinaNet) or Without Anchor (Densebox, YOLO)
• Model Complexity• Difference on the extremely small model (< 30M flops on 224x224 input)
• Sampling • Application
• No Anchor: Face• With Anchor: Human, General Detection
• Problem for one stage detector• Unbalanced pos/neg data• Pool localization precision
![Page 45: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/45.jpg)
Two Stages Detector: RCNN
Rich feature hierarchies for accurate object detection and semantic segmentation, Girshirk etc, CVPR 2014
https://arxiv.org/pdf/1311.2524.pdf
![Page 46: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/46.jpg)
Two Stages Detector: RCNN
Discussion
• Extremely slow speed• selective search proposal (CPU)/warp
• not end-to-end optimized• Good for small objects
Rich feature hierarchies for accurate object detection and semantic segmentation, Girshirk etc, CVPR 2014
https://arxiv.org/pdf/1311.2524.pdf
![Page 47: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/47.jpg)
Two Stages Detector: RCNN
Experiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
RCNN 66 NA NA 47s
Rich feature hierarchies for accurate object detection and semantic segmentation, Girshirk etc, CVPR 2014
https://arxiv.org/pdf/1311.2524.pdf
![Page 48: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/48.jpg)
Two Stages Detector: RCNN -> Fast RCNN
Fast R-CNN, Girshick etc, ICCV 2015
https://arxiv.org/pdf/1504.08083.pdf
![Page 49: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/49.jpg)
Two Stages Detector: Fast RCNN
Discussion
• slow speed• selective search proposal (CPU)
• not end-to-end optimized• ROI pooling
• alignment issue• sampling• aspect ratio changes
Fast R-CNN, Girshick etc, ICCV 2015
https://arxiv.org/pdf/1504.08083.pdf
![Page 50: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/50.jpg)
Two Stages Detector: Fast RCNN
Experiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
RCNN 66 NA NA 47s
Fast RCNN 77.0 82.3 (wth coco data) NA 0.5s
Fast R-CNN, Girshick etc, ICCV 2015
https://arxiv.org/pdf/1504.08083.pdf
![Page 51: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/51.jpg)
Two Stages Detector: RCNN -> Fast RCNN -> FasterRCNN
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren etc, CVPR 2016
https://arxiv.org/pdf/1506.01497.pdf
![Page 52: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/52.jpg)
Two Stages Detector: Faster RCNN
Discussion
• speed• selective search proposal (CPU) -> RPN
• alternative optimization/end-to-end optimization• Recall issue due to two stages detector
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren etc, CVPR 2016
https://arxiv.org/pdf/1506.01497.pdf
![Page 53: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/53.jpg)
Two Stages Detector: Faster RCNNExperiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
RCNN 66 NA NA 47s
Fast RCNN 77.0 82.3 (wth coco data) NA 0.5s
Faster RCNN 73.2 70.4 NA 5
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren etc, CVPR 2016
https://arxiv.org/pdf/1506.01497.pdf
![Page 54: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/54.jpg)
Two Stages Detector: RCNN -> Fast RCNN -> FasterRCNN -> RFCN
R-FCN: Object Detection via Region-based Fully Convolutional Networks, Dai etc, NIPS 2016,
https://arxiv.org/pdf/1605.06409.pdf
![Page 55: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/55.jpg)
Two Stages Detector: RFCN
Discussion
• Share convolution• fasterRCNN: shared Res1-4 (RPN), not shared Res5 (RCNN)• RFCN: shared Res1-5 (both RPN and RCNN)
• PSPooling• a large number of channels:(7x7xC)xWxH• Problems in ROIPooling also exist
• Fully connected vs Convolution• fc: global context• conv: can be shared but the context is relative small• trade-off: large kernel
R-FCN: Object Detection via Region-based Fully Convolutional Networks, Dai etc, NIPS 2016,
https://arxiv.org/pdf/1605.06409.pdf
![Page 56: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/56.jpg)
Two Stages Detector: RFCNExperiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
RCNN 66 NA NA 47s
Fast RCNN 77.0 82.3 (wth coco data) NA 0.5s
Faster RCNN 73.2 70.4 NA 200ms
RFCN 79.5 77.6 29.9 170ms
R-FCN: Object Detection via Region-based Fully Convolutional Networks, Dai etc, NIPS 2016,
https://arxiv.org/pdf/1605.06409.pdf
![Page 57: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/57.jpg)
Two Stages Detector: RFCN -> Deformable Convolutional Networks
Deformable Convolutional Networks, Dai etc, ICCV 2017
https://arxiv.org/abs/1703.06211
![Page 58: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/58.jpg)
Two Stages Detector: RFCN -> Deformable Convolutional Networks
Deformable Convolutional Networks, Dai etc, ICCV 2017
https://arxiv.org/abs/1703.06211
![Page 59: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/59.jpg)
Two Stages Detector: RFCN -> Deformable Convolutional Networks
Discussion
• Deformable pool is similar to ROIAlign (in Mask RCNN)• Deformable conv
• flexible to learn the non-rigid objects
Deformable Convolutional Networks, Dai etc, ICCV 2017
https://arxiv.org/abs/1703.06211
![Page 60: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/60.jpg)
Two Stages Detector: RCNN -> Fast RCNN -> FasterRCNN -> FPN
Feature Pyramid Networks for Object Detection, Lin etc, CVPR 2017
https://arxiv.org/pdf/1612.03144.pdf
![Page 61: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/61.jpg)
Two Stages Detector: FPN
Discussion
• FasterRCNN reproduced (setting)• Deeply supervised (better feature)
Feature Pyramid Networks for Object Detection, Lin etc, CVPR 2017
https://arxiv.org/pdf/1612.03144.pdf
![Page 62: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/62.jpg)
Two Stages Detector: FPNExperiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
RCNN 66 NA NA 47s
Fast RCNN 77.0 82.3 (wth coco data) NA 0.5s
Faster RCNN 73.2 70.4 NA 200ms
RFCN 79.5 77.6 29.9 170ms
FPN NA NA 36.2 6
Feature Pyramid Networks for Object Detection, Lin etc, CVPR 2017
https://arxiv.org/pdf/1612.03144.pdf
![Page 63: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/63.jpg)
Two Stages Detector: RCNN -> Fast RCNN -> FasterRCNN -> FPN -> MaskRCNN
Mask R-CNN, He etc, ICCV 2017
https://arxiv.org/pdf/1703.06870.pdf
![Page 64: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/64.jpg)
Two Stages Detector: RCNN -> Fast RCNN -> FasterRCNN -> FPN -> MaskRCNN
Mask R-CNN, He etc, ICCV 2017
https://arxiv.org/pdf/1703.06870.pdf
![Page 65: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/65.jpg)
Two Stages Detector: Mask RCNN
Discussion
• Alignment issue in ROIPooling -> ROIAlign• Multi-task learning: detection & mask
Mask R-CNN, He etc, ICCV 2017
https://arxiv.org/pdf/1703.06870.pdf
![Page 66: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/66.jpg)
Two Stages Detector: Mask RCNNExperiments
Method VOC 2007 test VOC 2012 test COCO time (fps)
YOLO 52.7/63.4 57.9/NA NA 45/155
YOLOv2 78.6 73.4 21.6 40
SSD 77.2/79.8 75.8/78.5 25.1/28.8 46/19
DSSD 81.5 80.0 33.2 5.5
RON 81.3 80.7 27.4 15
RetinaNet NA N 39.1 5
RCNN 66 NA NA 47s
Fast RCNN 77.0 82.3 (wth coco data) NA 0.5s
Faster RCNN 73.2 70.4 NA 200ms
RFCN 79.5 77.6 29.9 170ms
FPN NA NA 36.2 6
Mask RCNN NA NA 38.2 2.5
Mask R-CNN, He etc, ICCV 2017
https://arxiv.org/pdf/1703.06870.pdf
![Page 67: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/67.jpg)
Two Stages Detector: Summary
• Speed• RCNN -> Fast RCNN -> Faster RCNN -> RFCN
• performance• Divide and conquer
• FPN• Deformable Pool/ROIAlign• Deformable Conv• Multi-task learning
![Page 68: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/68.jpg)
Two Stages Detector: Discussion
FasterRCNN vs RFCN
One stage vs two Stage
![Page 69: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/69.jpg)
MegDetection
Introduction & Demo Video
![Page 70: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/70.jpg)
Open Problem in Detection
• FP• NMS (detection in crowd)• GT assignment issue• Detection in video
• detect & track in a network
![Page 71: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/71.jpg)
Outline
• Detection• Human Keypoint• Conclusion
![Page 72: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/72.jpg)
Human Keypoint Task
• Single Person Skeleton• Cropped RGB image -> 2d key points / 3d key points• Keyword: inter-middle loss, large receptive field, context
• Multiple-Person Skeleton• RGB image -> human localization & human Keypoint for each person
![Page 73: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/73.jpg)
Single Person Skeleton: CPM
Convolutional Pose Machines, Wei etc, CVPR 2016
https://arxiv.org/pdf/1602.00134.pdf
![Page 74: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/74.jpg)
Single Person Skeleton: Hourglass
Stacked Hourglass Networks for Human Pose Estimation, Newell etc, ECCV 2016
https://arxiv.org/pdf/1603.06937.pdf
![Page 75: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/75.jpg)
Multiple-Person Skeleton
• Top Down• Detect -> Single person skeleton
• Bottom Up• Deep/Deeper Cut• OpenPose• Associative Embedding
![Page 76: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/76.jpg)
Multiple-Person Skeleton: OpenPose
CPM + PAF
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Cao etc, CVPR 2017
https://arxiv.org/pdf/1611.08050.pdf
![Page 77: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/77.jpg)
Multiple-Person Skeleton: OpenPose
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields, Cao etc, CVPR 2017
https://arxiv.org/pdf/1611.08050.pdf
https://github.com/CMU-Perceptual-Computing-Lab/openpose
![Page 78: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/78.jpg)
Multiple-Person Skeleton: Associative Embedding
Hourglass + AE
Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Newell etc, NIPS 2017
https://arxiv.org/pdf/1611.05424.pdf
![Page 79: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/79.jpg)
Multiple-Person Skeleton: Associative Embedding
Associative Embedding: End-to-End Learning for Joint Detection and Grouping, Newell etc, NIPS 2017
https://arxiv.org/pdf/1611.05424.pdf
![Page 80: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/80.jpg)
Multiple-Person Skeleton: Discussion
• Top Down:• Depends on the detector
• Fail in the crowd case• Fail with partial observation• can detect the small-scale human
• More computation• Better localization when the input-size of single person skeleton is large
• Bottom up:• Fast computational speed• good at localizing the human with partial observation• Hard to assemble human
![Page 81: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/81.jpg)
Challenges in Skeleton
• combine top-down approaches with bottom-up approaches
• perform pose track
• handle the crowd case
![Page 82: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/82.jpg)
MegSkeleton
Introduction and demo Video
![Page 83: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/83.jpg)
Outline
• Detection• Human Keypoint• Conclusion
![Page 84: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/84.jpg)
Conclusion
• Detection• One stage: Densebox, YOLO, SSD, RetinaNet• Two Stage: RCNN, Fast RCNN, FasterRCNN, RFCN, FPN, Mask RCNN
• Skeleton• Single Person Skeleton: CPM, Hourglass• Multi-person Skeleton
• Top Down• Bottom up: Openpose, Associative Embedding
![Page 85: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/85.jpg)
Introduction to Object DetectionPart 2
![Page 86: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/86.jpg)
Outline
• Modern Object detectors• One Stage detector vs Two-stage detector
• Challenges• Backbone• Head• Scale• Batch Size• Crowd
• Conclusion
![Page 87: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/87.jpg)
What is object detection?
![Page 88: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/88.jpg)
Modern Object detectors
Backbone Head
• Modern object detectors• RetinaNet
• f1-f7 for backbone, f3-f7 with 4 convs for head• FPN with ROIAlign
• f1-f6 for backbone, two fcs for head• Recall vs localization
• One stage detector: Recall is high but compromising the localization ability• Two stage detector: Strong localization ability
Postprocess
NMS
![Page 89: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/89.jpg)
One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
![Page 90: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/90.jpg)
One Stage detector: RetinaNet
• FPN Structure
• Focal loss
Focal Loss for Dense Object Detection, Lin etc, ICCV 2017 Best student paper
![Page 91: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/91.jpg)
Two-Stage detector: FPN/Mask R-CNN
• FPN Structure
• ROIAlign
Mask R-CNN, He etc, ICCV 2017 Best paper
![Page 92: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/92.jpg)
What is next for object detection?
• The pipeline seems to be mature
• There still exists a large gap between existing state-of-arts and product requirements
• The devil is in the detail
![Page 93: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/93.jpg)
Challenges Overview
• Backbone• Head• Scale• Batch Size• Crowd
Backbone Head Postprocess
NMS
![Page 94: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/94.jpg)
Challenges - Backbone
• Backbone network is designed for classification task but not for localization task
• Receptive Field vs Spatial resolution
• Only f1-f5 is pretrained but randomly initializing f6 and f7 (if applicable)
![Page 95: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/95.jpg)
Backbone - DetNet
• DetNet: A Backbone network for Object Detection, Li etc, 2018, https://arxiv.org/pdf/1804.06215.pdf
![Page 96: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/96.jpg)
Backbone - DetNet
![Page 97: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/97.jpg)
Backbone - DetNet
![Page 98: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/98.jpg)
Backbone - DetNet
![Page 99: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/99.jpg)
Backbone - DetNet
![Page 100: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/100.jpg)
Backbone - DetNet
![Page 101: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/101.jpg)
Challenges - Head
• Speed is significantly improved for the two-stage detector• RCNN - > Fast RCNN -> Faster RCNN - > RFCN
• How to obtain efficient speed as one stage detector like YOLO, SSD?• Small Backbone
• Light Head
![Page 102: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/102.jpg)
Head – Light head RCNN• Light-Head R-CNN: In Defense of Two-Stage Object Detector, 2017,
https://arxiv.org/pdf/1711.07264.pdfCode: https://github.com/zengarden/light_head_rcnn
![Page 103: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/103.jpg)
Head – Light head RCNN
• Backbone• L: Resnet101
• S: Xception145
• Thin Feature map• L:C_{mid} = 256
• S: C_{mid} =64
• C_{out} = 10 * 7 * 7
• R-CNN subnet• A fc layer is connected to the PS ROI pool/Align
![Page 104: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/104.jpg)
Head – Light head RCNN
![Page 105: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/105.jpg)
Head – Light head RCNN
![Page 106: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/106.jpg)
Head – Light head RCNN
• Mobile Version• A technique report will be released soon
• The first two-stage detector on mobile devices
![Page 107: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/107.jpg)
Challenges - Scale
• Scale variations is extremely large for object detection
![Page 108: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/108.jpg)
Challenges - Scale
• Scale variations is extremely large for object detection
• Previous works• Divide and Conquer: SSD, DSSD, RON, FPN, …
• Limited Scale variation
• Scale Normalization for Image Pyramids, Singh etc, CVPR2018• Slow inference speed
• How to address extremely large scale variation without compromising inference speed?
![Page 109: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/109.jpg)
Scale - SFace
• SFace: An Efficient Network for Face Detection in Large Scale Variations, 2018, http://cn.arxiv.org/pdf/1804.06559.pdf• Anchor-based:
• Good localization for the scales which are covered by anchors
• Difficult to address all the scale ranges of faces
• Anchor-free:• Able to cover various face scales
• Not good for the localization ability
![Page 110: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/110.jpg)
Scale - SFace
![Page 111: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/111.jpg)
Scale - SFace
![Page 112: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/112.jpg)
Scale - SFace
![Page 113: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/113.jpg)
Scale - SFace
• Summary:• Integrate anchor-based and anchor-free for the scale issue
• A new benchmark for face detection with large scale variations: 4K Face
![Page 114: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/114.jpg)
Challenges - Batchsize
• Small mini-batchsize for general object detection• 2 for R-CNN, Faster RCNN
• 16 for RetinaNet, Mask RCNN
• Problem with small mini-batchsize• Long training time
• Insufficient BN statistics
• Inbalanced pos/neg ratio
![Page 115: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/115.jpg)
Batchsize – MegDet
• MegDet: A Large Mini-Batch Object Detector, CVPR2018, https://arxiv.org/pdf/1711.07240.pdf
![Page 116: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/116.jpg)
Batchsize – MegDet
• Techniques• Learning rate warmup
• Cross-GPU Batch Normalization
![Page 117: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/117.jpg)
Challenges - Crowd
• NMS is a post-processing step to eliminate multiple responses on one object instance• Reasonable for mild crowdness like COCO and VOC
• Will Fail in the case when the objects are in a crowd
![Page 118: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/118.jpg)
Challenges - Crowd
• A few works have been devoted to this topic• Softnms, Bodla etc, ICCV 2017, http://www.cs.umd.edu/~bharat/snms.pdf
• Relation Networks, Hu etc, CVPR 2018, https://arxiv.org/pdf/1711.11575.pdf
• Lacking a good benchmark for evaluation in the literature
![Page 119: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/119.jpg)
Crowd - CrowdHuman
• CrowdHuman: A Benchmark for Detecting Human in a Crowd, 2018, https://arxiv.org/pdf/1805.00123.pdf, http://www.crowdhuman.org/• A benchmark with Head, Visible Human, Full body bounding-box
• Generalization ability for other head/pedestrian datasets
• Crowdness
![Page 120: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/120.jpg)
Crowd - CrowdHuman
![Page 121: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/121.jpg)
Crowd-CrowdHuman
![Page 122: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/122.jpg)
Crowd-CrowdHuman
• Generalization• Head
• Pedestrian
• COCO
![Page 123: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/123.jpg)
Conclusion
• The task of object detection is still far from solved
• Details are important to further improve the performance• Backbone
• Head
• Scale
• Batchsize
• Crowd
• The improvement of object detection will be a significantly boost for the computer vision industry
![Page 124: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/124.jpg)
Introduction to Face++ Detection Team• Category-level Recognition
• Detection• Face Detection:
• FAN: https://arxiv.org/pdf/1711.07246.pdf
• Sface: https://arxiv.org/pdf/1804.06559.pdf
• Human Detection:
• Repulsion loss: https://arxiv.org/abs/1711.07752
• CrowdHuman: https://arxiv.org/pdf/1805.00123.pdf
• General Object Detection:
• Light Head: https://arxiv.org/pdf/1711.07264.pdfhttps://github.com/zengarden/light_head_rcnn
• MegDet: https://arxiv.org/pdf/1711.07240.pdf
• DetNet: https://arxiv.org/pdf/1804.06215.pdf
• Segmentation• Large Kernel Matters: https://arxiv.org/pdf/1703.02719.pdf
• DFN: https://arxiv.org/pdf/1804.09337.pdf
• BiseNet: https://arxiv.org/pdf/1808.00897.pdf
• Skeleton:• CPN: https://arxiv.org/pdf/1711.07319.pdf
• https://github.com/chenyilun95/tf-cpn
![Page 125: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/125.jpg)
Introduction to Face++ Detection Team
• COCO2017 Detection/Skeleton Winner (ICCV 2017)
• ActivityNet AVA Winner (CVPR 2018)
• WAD Instance Segmentation Winner (CVPR 2018)
• WiderFace Challenge Winner (ECCV 2018)
• COCO 2018 Detection/Panoptic Seg/Skeleton Winner (ECCV 2018)
• Mapillary Panoptic Seg Winner (ECCV 2018)
![Page 127: Introduction to Object Detection - GitHub Pages...2019/04/03 · One Stage Detector: YOLO Discussion • fc reshape (4096-> 7x7x30) • more context • but not fully convolutional](https://reader030.vdocuments.us/reader030/viewer/2022040116/5ece2f9f6bbfcd2591178dc7/html5/thumbnails/127.jpg)
Thanks