cpsc 425: computer vision › ~ftung › cpsc425 › lecture22.pdf · learning the model can be...

CPSC 425: Computer Vision

Instructor: Fred [email protected]

Department of Computer ScienceUniversity of British Columbia

Lecture Notes 2015/2016 Term 2

1 / 40

Menu March 29, 2016

Topics:Object Detection (cont.)— Deformable part model— Object proposalsConvolutional Neural Networks (preview)

Reading:Today: Forsyth & Ponce (2nd ed.) 17.1–17.2

Reminders:Assignment 7 due Tuesday, April 5www: http://www.cs.ubc.ca/~ftung/cpsc425/piazza: https://piazza.com/ubc.ca/winterterm22015/cpsc425/

2 / 40

http://www.cs.ubc.ca/~ftung/cpsc425/

https://piazza.com/ubc.ca/winterterm22015/cpsc425/

Today’s Fun ExampleGoogle DeepMind’s AlphaGo

Image source: Nature, 2016

3 / 40

Lecture 21: Re-cap

Sliding window strategy: Train an image classifier. ‘Slide’ adetection window across the image and evaluate the classifier oneach window.

Viola and Jones’ face detector is a classic example— An integral image enables constant-time summing of arectangular region— A classifier cascade tries to quickly rule out negative windowsusing fast classifiers early on

A deformable part model consists of— root: coarse model that gives overall appearance of the object— parts: object components that have reliable appearance butmight appear at somewhat different locations for differentinstances

4 / 40

Deformable Part Model

Felzenszwalb et al., 2010

Each part has an appearance model and a natural locationrelative to the root

Finding a window that looks a lot like the part close to that part’snatural location relative to the root yields evidence that the objectis present

5 / 40


A parts model for a bicycle, containing a root and 6 parts

Figure source: Felzenszwalb et al., 2010

6 / 40



The learned root model is a set of linear weights β(r) applied tothe feature descriptor of the root window

The i th learned part model consists of— a set of linear weights β(pi ) applied to the feature descriptor ofthe part window— a natural location (offset) relative to the root v(pi ) = (u(pi ), v (pi ))

— a set of distance weights d(pi ) = (d (pi )1 ,d (pi )

2 ,d (pi )3 ,d (pi )

4 )

7 / 40

Sliding Window with Deformable Part Model

The overall score of the deformable parts model at a particularwindow will be the sum of several scores

— A root score compares the root to the window

— Each part has its own score, consisting of an appearance scoreand a location score

Model score = Root score +∑

i

Part i score (1)

8 / 40


Denote by φ(x , y) the feature descriptor of a part window at offset(x , y) relative to the root.Denote by (dx ,dy) = (u(pi ), v (pi ))− (x , y) the difference from thepart’s natural offset relative to the root.The score for part i at offset (x , y) is given by

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) = β(pi )φ(x , y) (2)

− (d (pi )1 dx + d (pi )

2 dy + d (pi )3 dx2 + d (pi )

4 dy2)

The final part i score is the best score found over all possibleoffsets (x , y)

Part i score = max(x ,y)

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) (3)

9 / 40


Denote by φ(x , y) the feature descriptor of a part window at offset(x , y) relative to the root.Denote by (dx ,dy) = (u(pi ), v (pi ))− (x , y) the difference from thepart’s natural offset relative to the root.The score for part i at offset (x , y) is given by

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) = β(pi )φ(x , y) (2)

− (d (pi )1 dx + d (pi )

2 dy + d (pi )3 dx2 + d (pi )

4 dy2)

The final part i score is the best score found over all possibleoffsets (x , y)

Part i score = max(x ,y)

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) (3)

10 / 40

Learning a Deformable Part Model

Learning the model can be tricky. Why?

A class model can consist of multiple component modelsrepresenting different canonical views— e.g. a front and lateral model of a bicycle

We do not know which component model should respond to whichtraining example

We also do not know the locations of the parts in the trainingexamples

11 / 40






12 / 40






13 / 40


However, notice that if the component and the part locations foreach training example are given (fixed), we can simply train alinear SVM as usual

This observation leads to the following iterative strategy:

— Assume components and part locations are given (fixed).Compute appearance and offset models.

— Assume appearance and offset models are given (fixed).Re-estimate components and part locations.

14 / 40


However, notice that if the component and the part locations foreach training example are given (fixed), we can simply train alinear SVM as usual

This observation leads to the following iterative strategy:

— Assume components and part locations are given (fixed).Compute appearance and offset models.

— Assume appearance and offset models are given (fixed).Re-estimate components and part locations.

15 / 40

Deformable Part Model: Hard Negative Mining

Sliding window detectors must search over an immense numberof windows— Even a small false positive rate becomes noticeable

As a result, we want to train on as many negative examples aspossible, but remain computationally feasible

Hard negative mining: As we train the classifier, apply it to thenegative examples (e.g. ‘not a bicycle’) and keep track of onesthat get a strong response (e.g. are mistakenly detected asbicycles). Include these in the next round of training.

16 / 40

Deformable Part Model: Examples

A learned car model


17 / 40

Deformable Part Model: Examples

A learned cat model


18 / 40

Recall: Sliding Window

Train a window classifier. ‘Slide’ a detection window across theimage and evaluate the classifier on each window.

Image credit: KITTI Vision Benchmark

This is a lot of possible windows! And most will not contain theobject we are looking for.

19 / 40

Object Proposals

Object proposal algorithms generate a short list of regions thathave generic object-like properties

— These regions are likely to contain some kind of foregroundobject instead of background texture

The object detector then considers these candidate regions only,instead of exhaustive sliding window search

20 / 40

Object Proposals

First introduced by Alexe et al., who asked ‘what is an object?’and defined an ‘objectness’ score based on several visual cues

Figure credit: Alexe et al., 2012

This work argued that objects typically— are unique within the image and stand out as salient— have a contrasting appearance from surroundingsand/or— have a well-defined closed boundary in space

21 / 40

Object Proposals

First introduced by Alexe et al., who asked ‘what is an object?’and defined an ‘objectness’ score based on several visual cues


This work argued that objects typically— are unique within the image and stand out as salient— have a contrasting appearance from surroundingsand/or— have a well-defined closed boundary in space

22 / 40

Object ProposalsMultiscale Saliency— Favours regions with a unique appearance within the image


23 / 40

Object Proposals

Colour Contrast— Favours regions with a contrasting colour appearance fromimmediate surroundings


24 / 40

Object ProposalsSuperpixels Straddling— Favours regions with a well-defined closed boundary— Measures the extent to which superpixels (obtained by imagesegmentation) contain pixels both inside and outside of thewindow


25 / 40

Object ProposalsSuperpixels Straddling— Favours regions with a well-defined closed boundary— Measures the extent to which superpixels (obtained by imagesegmentation) contain pixels both inside and outside of thewindow


26 / 40

Object Proposals

Table credit: Alexe et al., 2012

Speeding up [11] HOG pedestrian detector [18] Deformable partmodel detector [33] Bag of words detector

27 / 40

Object Proposals: Other Strategies (Aside)

1. Hierarchical segmentation

Generate object proposals using multiple hierarchicalsegmentations. Perform segmentations in colour spaces withdifferent sensitivities to shadows, shading, and highlights.

Figure credit: K.E.A. van de Sande

28 / 40


2. Seed growing

Solve several figure-ground separation problems from differentinitial seeds

Figure credit: Carreira and Sminchisescu, 2012

29 / 40


3. Edges and image gradients

Score windows based on the number and strength of edgescontained within the window

Figure credit: Zitnick and Dollar, 2014

30 / 40

Convolutional Neural Networks (Preview)Recent years have seen increasing interest in applying neuralnetwork (‘deep learning’) techniques to solve visual recognitionproblems

Image source: Technology Review, 2013

31 / 40

A Neuron

Figure credit: Fei-Fei and Karpathy

The basic unit of computation in a neural network is a neuron.A neuron accepts some number of input signals, computes theirweighted sum, and applies an activation function (or non-linearity)to the sum.Common activation functions include sigmoid and rectified linearunit (ReLU)

32 / 40

Sigmoid

f (x) = 1/(1 + e−x)


Common in many early neural networks

Biological analogy to saturated firing rate of neurons

Maps the input to the range [0,1]

33 / 40

ReLU

f (x) = max(0, x)


Found to accelerate convergence during learning

Used in the most recent neural networks

34 / 40

Neural NetworksA neural network comprises neurons connected in an acyclicgraphThe outputs of neurons can become inputs to other neuronsNeural networks typically contain multiple layers of neurons

Example of a neural network with three inputs, a single hidden layer offour neurons, and an output layer of two neurons


35 / 40

Clicker Quiz (1 question)

Please turn on clickers

36 / 40

Neural Networks

Training a neural network requires estimating a large number ofparametersModern convolutional neural networks contain 10-20 layers andon the order of 100 million parameters

39 / 40

Summary

Detection scores in the deformable part model are based on bothappearance and location

The deformable part model is trained iteratively by alternating thesteps

1 Assume components and part locations given; computeappearance and offset models

2 Assume appearance and offset models given; computecomponents and part locations

An object proposal algorithm generates a short list of regions withgeneric object-like properties that can be evaluated by an objectdetector in place of an exhaustive sliding window search

40 / 40

cpsc 425: computer vision › ~ftung › cpsc425 › lecture22.pdf · learning the model can be...

Documents