g-cnn: an iterative grid based object detectorvgg/rg/slides/gcnn.pdf · g-cnn: an iterative grid...

26
G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari Univ. Maryland Magyar Najibi Univ. Maryland VGG Reading Group - Sam Albanie

Upload: lytu

Post on 05-Nov-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

G-CNN: an Iterative Grid Based Object Detector

CVPR, 2016

Larry S. Davis Univ. Maryland

Mohammad Rastegari Univ. Maryland

Magyar Najibi Univ. Maryland

VGG Reading Group - Sam Albanie

Page 2: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Motivation: Proposals are Expensive

Example: find the father of the internet

Selective Search2.24 seconds

EdgeBoxes0.38 seconds

Page 3: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Motivation: Proposals are Expensive

Cheaper Alternative: gridsExample: find the father of the internet

Page 4: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Motivation: Keep accuracy with iterations!

Downside of using grids: loss of accuracy

In G-CNN, high accuracy is achieved with grid proposals by using an iterative bounding box regression scheme

Inspired by IEF - move the work into the regression space!

Page 5: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Some members of the postdeepluvian* object detection family tree

*Serge Bolongieism

R-CNN

CVPR 2014

FAST R-CNN

SPP-Net

ARXIV Nov, 2013

R-CNN minus R

YOLO

Faster RCNN

ProNet

G-CNNFully connected bidirectional inspiration layer

ARXIV Apr, 2015

ICCV 2015ARXIV June, 2014

ECCV 2014

ARXIV June, 2015

BMVC 2015

ARXIV Nov, 2015

ARXIV June, 2015

CVPR 2016

NIPS 2015

LocNet

ARXIV Nov, 2015

CVPR 2016ARXIV June, 2015

ARXIV Dec, 2015

CVPR 2016

CVPR 2016

Page 6: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Potentially interestingR-CNN authors investigated iterative procedure:

“At test time, we score each proposal and predict its new detection window only once. In principle, we could iterate this procedure (i.e. re-score the newly predicted bounding box and then predict a new bounding box from it, and so on). However, we found that iterating does not improve results”

APPENDIX C - Rich Feature Hierarchies for accurate object detection and semantic segmentation

Page 7: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Discussion

Page 8: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

How does it work?

Page 9: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Bounding Box Regression In Object Detection: Recap

Introduced in the DPM paper (geometric features)

Revisited in the R-CNN paper (CNN features)

Key idea: snakes are not the same shape as donkeys. (i.e. once you have predicted the object category, you should be able to improve your bounding box)

Page 10: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Bounding Box Regression In Object Detection: R-CNN styleTraining: The goal is to learn a mapping per category from a proposed box P to a ground truth box G. Inputs are N training pairs

{(Pi

, Gi

)}i=1,...,N , where P i = (P i

x

, P i

y

, P i

w

, P i

h

)

Parameterise mapping with linear functions such that:

dx

(P ), dy

(P ), dw

(P ), dh

(P )

Gx

= Pw

dx

(P ) + Px

Gy

= Ph

dy

(P ) + Py

ˆGw = Pwexp(dw(P )))

ˆGh = Phexp(dh(P )))

(scale invariant) (log space)

The functions are learned with ridge regression.

Page 11: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Training Architecture

Notes: bounding box colours

Page 12: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Bounding Box Regression: The Nitty Gritty for G-CNN

Training - each bounding box with IoU > 0.2 assigned to one of ground truth boxes in the same image, based on its initial grid position.

The function is learned with piece-wise regression, using target boxes at step 1 < s < S_train

Page 13: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Bounding Box Regression: The Nitty Gritty for G-CNN

Loss function:

L_reg is the smooth L1 loss from Fast R-CNN:

L({Bi}) =StrainX

s=1

NX

i=1

[I(B1i /2 BBG)⇥ Lreg(�

si,li ��(Bs

i ,�(Bsi ,A(Bs

i ), s)))]

Page 14: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Bounding Box Regression: The Nitty Gritty for G-CNN

For efficiency during training, approximate predicted update

with the perfect update

Page 15: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Optimisation

SGD ftw.

Note: sampling biases early iteration steps

Page 16: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Test-time architecture

Comparison to R-CNN: N_proposal vs (S_test x N_grid)

Page 17: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Demo

Page 18: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Experiments

Page 19: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Config2x2 5x5 10x10

Training overlaps: [0.9, 0.8, 0.7]

Test overlaps: [0.7, 0.5, 0]

Regression network is trained for S = 3 steps

Page 20: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Experiment 1:VOC 2007

Each network was based on Alexnet, trained on VOC 2007 trainval set and evaluated on the test set.

G-CNN(3) := Three steps at test time + approx 1500 initial boxes

G-CNN(5) := Five steps at test time + approx 180 initial boxes

FR-CNN := One step at test time + approx 2000 initial boxes (SS)

Page 21: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Experiment 2:VOC 2007

Each network was based on VGG-16, trained on VOC 2007 trainval set and evaluated on the test set.

Claim: G-CNN effectively moves small # of boxes to targets

Page 22: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Experiment 3:VOC 2012

Each network was based on VGG-16 with the following training 12 := VOC2012 trainval, 07+12 := VOC2007 trainval + VOC2012 trainval 07++12 := VOC2007 trainval/test + VOC2012 trainval

Claim: G-CNN provides best mAP without a proposal stage

Page 23: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Experiment 4:VOC 2007

Each network was based on Alexnet, trained on VOC 2007 trainval set and evaluated on the test set, with five steps at test time

IF-FRCNN := Apply FR-CNN iteratively

Claim: Stepwise training matters

1Step-Grid := Train G-CNN with all tuples in one step

Page 24: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Analysis of Detection Results

Claim: Removing proposal stage did not hurt localisation

Page 25: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Detection Run Time

Fast R-CNN: 0.5 fps

Benchmarks with two K40 GPUs with VGG16 Net

G-CNN: 3 fps

Page 26: G-CNN: an Iterative Grid Based Object Detectorvgg/rg/slides/GCNN.pdf · G-CNN: an Iterative Grid Based Object Detector CVPR, 2016 Larry S. Davis Univ. Maryland Mohammad Rastegari

Rough comparison with current state of the art (VOC 2007 test set)Different training sets give an idea of how well the model scales with additional data. Table compiled July 2016

R-FCN (mAP of 69.9) @5fps

NOTE: By the time you are reading this, it is probably out of date…

Model Training Speed (juice) mAPG-CNN 07 3fps (2xK40) 66.8

Faster R-CNN 07+12 5fps (K40) 73.2SSD-300 07+12 58fps (TITAN X) 72.1SSD-500 07+12 23fps (TITAN X) 75.1R-FCN 07+12 6fps (K40) 80.5

Faster R-CNN 07+12+CO 5fps (K40) 85.6R-FCN 07+12+CO 6fps (K40) 83.6