spp-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than r-cnn, as...

Post on 26-Dec-2019

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SPP-netSpatial Pyramid Poolingin Deep ConvolutionalNetworks

Highlights

• ILSVRC 2014 (all provided-data tracks)

• DET -2nd

• CLS - 3rd

• LOC - 5th

• ECCV 2014 paper

• Published 2 months ago (arXiv:1406.4729v1, June18)

• Details disclosed (arXiv:1406.4729v2)

Overview

• SPP-net- a new network structure

• Classification- improves all CNNs

• Detection- 20-60x faster than R-CNN, asaccurate

Spatial PyramidMatching

• SPM: very successful in traditional computer vision[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets ofImage Features”

[Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural SceneCategories”

denseSIFT encoded

(VQ, SC,FV)SPM SVM

prediction

“fc layers”simply pooling?“conv layers”CNN

counterparts

SPP-net: SPM inCNN

1000

4096 4096

traditional

CNN

fixedsize conv fc

SPP-net

anysize

1000

4096 4096

spatialpyramid

pooling

• Fix bin numbers

• DO NOT fix binsize

SPP-net

• variable input size/scale• multi-size training

• multi-scale testing

• full-image view

• multi-level pooling• robust to deformation

• operates on featuremaps• pooling in regions

conv feature maps

conv layers

input image

concatenate

…...

…...

spatial pyramid poolinglayer

fc layers

14.76

13.92

13.52

11.97

14.14

13.54

11.12

13.64

13.33

12.80

12.33

10.95

10.50

10.00

11.00

11.50

12.00

12.50

13.00

13.50

14.00

14.50

15.00

ZF-5 Convnet*-5 Overfeat-5 Overfeat-7

ILSVRC top-5 val (10-view)

no-SPPbaselines

+ multi-size training

multi-level pooling

All CNNs

improved!

4architectures

ILSVRC 2014 CLSResults

• “shallow”• 7-conv, 1 Titan GPU, 3weeks

• but potential• SPP can improve deeper nets: >1% gain post-competition

team top-5 test

GoogLeNet 6.66

Oxford VGG 7.32

ours 8.06

Howard 8.11

DeeperVision 9.50

NUS-BST 9.79

TTIC_ECP 10.22

7-conv SPP-net,10-view 10.95%

7-conv SPP-net,9m6u-vltiei-wsc+a2le-f/uvlilew 9.08%

multiple SPP-nets 8.06%

Detection: SPP onRegions

SPP

conv feature maps

conv layers

input image

region

fc layers

…...

RCNN vs.SPP

• image regions vs. feature mapregions

SPP-net

1 net on fullimage

image

net

feature

featurefeature

net

image

net

feature

net

feature

net

feature

feature

R-CNN

2000 nets on image regions

• With regional features, we can do everything ofRCNN• fine-tune, SVM, bbox regression…

• similar accuracy, much faster

SPP-net1-scale

SPP-net5-scale

RCNN

mAP 58.0 59.2 58.5

GPU time / img 0.14s 0.38s 9s

speed-up 64x 24x -

VOC2007

SPP-net RCNN

GPU time / img 0.6s 32s

40k test imgs 8 hours 15days

cost of a singlemodel

ILSVRC 2014 DETResults

“provided data” track

mAP

NUS 37.2

ours, multiSPP-nets 35.1

UvA 32.0

ours, 1 SPP-net 31.8

Southeast-CASIA 30.4

1-HKUST 28.8

CASIA_CRIPAC_2 28.6

• Conclusion• SPM inCNNs

• CLS: improve all CNNs in the literature

• DET: practical, fast, andaccurate

• Futurework• SPP on advancednetworks

• Resources•code, config, tech report… http://research.microsoft.c

om/en-us/um/people/kahe/

• Acknowledgement• We thank NVIDIA for the GPUdonation.

top related