cpsc 425: computer vision › ~ftung › cpsc425 › lecture22.pdf · learning the model can be...

38
CPSC 425: Computer Vision Instructor: Fred Tung [email protected] Department of Computer Science University of British Columbia Lecture Notes 2015/2016 Term 2 1 / 40

Upload: others

Post on 07-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

CPSC 425: Computer Vision

Instructor: Fred [email protected]

Department of Computer ScienceUniversity of British Columbia

Lecture Notes 2015/2016 Term 2

1 / 40

Page 2: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Menu March 29, 2016

Topics:Object Detection (cont.)— Deformable part model— Object proposalsConvolutional Neural Networks (preview)

Reading:Today: Forsyth & Ponce (2nd ed.) 17.1–17.2

Reminders:Assignment 7 due Tuesday, April 5www: http://www.cs.ubc.ca/~ftung/cpsc425/piazza: https://piazza.com/ubc.ca/winterterm22015/cpsc425/

2 / 40

Page 3: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Today’s Fun ExampleGoogle DeepMind’s AlphaGo

Image source: Nature, 2016

3 / 40

Page 4: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Lecture 21: Re-cap

Sliding window strategy: Train an image classifier. ‘Slide’ adetection window across the image and evaluate the classifier oneach window.

Viola and Jones’ face detector is a classic example— An integral image enables constant-time summing of arectangular region— A classifier cascade tries to quickly rule out negative windowsusing fast classifiers early on

A deformable part model consists of— root: coarse model that gives overall appearance of the object— parts: object components that have reliable appearance butmight appear at somewhat different locations for differentinstances

4 / 40

Page 5: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Deformable Part Model

Felzenszwalb et al., 2010

Each part has an appearance model and a natural locationrelative to the root

Finding a window that looks a lot like the part close to that part’snatural location relative to the root yields evidence that the objectis present

5 / 40

Page 6: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Deformable Part Model

A parts model for a bicycle, containing a root and 6 parts

Figure source: Felzenszwalb et al., 2010

6 / 40

Page 7: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Deformable Part Model

Figure source: Felzenszwalb et al., 2010

The learned root model is a set of linear weights β(r) applied tothe feature descriptor of the root window

The i th learned part model consists of— a set of linear weights β(pi ) applied to the feature descriptor ofthe part window— a natural location (offset) relative to the root v(pi ) = (u(pi ), v (pi ))

— a set of distance weights d(pi ) = (d (pi )1 ,d (pi )

2 ,d (pi )3 ,d (pi )

4 )

7 / 40

Page 8: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Sliding Window with Deformable Part Model

The overall score of the deformable parts model at a particularwindow will be the sum of several scores

— A root score compares the root to the window

— Each part has its own score, consisting of an appearance scoreand a location score

Model score = Root score +∑

i

Part i score (1)

8 / 40

Page 9: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Sliding Window with Deformable Part Model

Denote by φ(x , y) the feature descriptor of a part window at offset(x , y) relative to the root.Denote by (dx ,dy) = (u(pi ), v (pi ))− (x , y) the difference from thepart’s natural offset relative to the root.The score for part i at offset (x , y) is given by

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) = β(pi )φ(x , y) (2)

− (d (pi )1 dx + d (pi )

2 dy + d (pi )3 dx2 + d (pi )

4 dy2)

The final part i score is the best score found over all possibleoffsets (x , y)

Part i score = max(x ,y)

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) (3)

9 / 40

Page 10: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Sliding Window with Deformable Part Model

Denote by φ(x , y) the feature descriptor of a part window at offset(x , y) relative to the root.Denote by (dx ,dy) = (u(pi ), v (pi ))− (x , y) the difference from thepart’s natural offset relative to the root.The score for part i at offset (x , y) is given by

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) = β(pi )φ(x , y) (2)

− (d (pi )1 dx + d (pi )

2 dy + d (pi )3 dx2 + d (pi )

4 dy2)

The final part i score is the best score found over all possibleoffsets (x , y)

Part i score = max(x ,y)

S(pi )(x , y ;β(pi ),d(pi ),v(pi )) (3)

10 / 40

Page 11: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Learning a Deformable Part Model

Learning the model can be tricky. Why?

A class model can consist of multiple component modelsrepresenting different canonical views— e.g. a front and lateral model of a bicycle

We do not know which component model should respond to whichtraining example

We also do not know the locations of the parts in the trainingexamples

11 / 40

Page 12: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Learning a Deformable Part Model

Learning the model can be tricky. Why?

A class model can consist of multiple component modelsrepresenting different canonical views— e.g. a front and lateral model of a bicycle

We do not know which component model should respond to whichtraining example

We also do not know the locations of the parts in the trainingexamples

12 / 40

Page 13: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Learning a Deformable Part Model

Learning the model can be tricky. Why?

A class model can consist of multiple component modelsrepresenting different canonical views— e.g. a front and lateral model of a bicycle

We do not know which component model should respond to whichtraining example

We also do not know the locations of the parts in the trainingexamples

13 / 40

Page 14: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Learning a Deformable Part Model

However, notice that if the component and the part locations foreach training example are given (fixed), we can simply train alinear SVM as usual

This observation leads to the following iterative strategy:

— Assume components and part locations are given (fixed).Compute appearance and offset models.

— Assume appearance and offset models are given (fixed).Re-estimate components and part locations.

14 / 40

Page 15: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Learning a Deformable Part Model

However, notice that if the component and the part locations foreach training example are given (fixed), we can simply train alinear SVM as usual

This observation leads to the following iterative strategy:

— Assume components and part locations are given (fixed).Compute appearance and offset models.

— Assume appearance and offset models are given (fixed).Re-estimate components and part locations.

15 / 40

Page 16: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Deformable Part Model: Hard Negative Mining

Sliding window detectors must search over an immense numberof windows— Even a small false positive rate becomes noticeable

As a result, we want to train on as many negative examples aspossible, but remain computationally feasible

Hard negative mining: As we train the classifier, apply it to thenegative examples (e.g. ‘not a bicycle’) and keep track of onesthat get a strong response (e.g. are mistakenly detected asbicycles). Include these in the next round of training.

16 / 40

Page 17: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Deformable Part Model: Examples

A learned car model

Figure source: Felzenszwalb et al., 2010

17 / 40

Page 18: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Deformable Part Model: Examples

A learned cat model

Figure source: Felzenszwalb et al., 2010

18 / 40

Page 19: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Recall: Sliding Window

Train a window classifier. ‘Slide’ a detection window across theimage and evaluate the classifier on each window.

Image credit: KITTI Vision Benchmark

This is a lot of possible windows! And most will not contain theobject we are looking for.

19 / 40

Page 20: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals

Object proposal algorithms generate a short list of regions thathave generic object-like properties

— These regions are likely to contain some kind of foregroundobject instead of background texture

The object detector then considers these candidate regions only,instead of exhaustive sliding window search

20 / 40

Page 21: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals

First introduced by Alexe et al., who asked ‘what is an object?’and defined an ‘objectness’ score based on several visual cues

Figure credit: Alexe et al., 2012

This work argued that objects typically— are unique within the image and stand out as salient— have a contrasting appearance from surroundingsand/or— have a well-defined closed boundary in space

21 / 40

Page 22: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals

First introduced by Alexe et al., who asked ‘what is an object?’and defined an ‘objectness’ score based on several visual cues

Figure credit: Alexe et al., 2012

This work argued that objects typically— are unique within the image and stand out as salient— have a contrasting appearance from surroundingsand/or— have a well-defined closed boundary in space

22 / 40

Page 23: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object ProposalsMultiscale Saliency— Favours regions with a unique appearance within the image

Figure credit: Alexe et al., 2012

23 / 40

Page 24: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals

Colour Contrast— Favours regions with a contrasting colour appearance fromimmediate surroundings

Figure credit: Alexe et al., 2012

24 / 40

Page 25: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object ProposalsSuperpixels Straddling— Favours regions with a well-defined closed boundary— Measures the extent to which superpixels (obtained by imagesegmentation) contain pixels both inside and outside of thewindow

Figure credit: Alexe et al., 2012

25 / 40

Page 26: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object ProposalsSuperpixels Straddling— Favours regions with a well-defined closed boundary— Measures the extent to which superpixels (obtained by imagesegmentation) contain pixels both inside and outside of thewindow

Figure credit: Alexe et al., 2012

26 / 40

Page 27: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals

Table credit: Alexe et al., 2012

Speeding up [11] HOG pedestrian detector [18] Deformable partmodel detector [33] Bag of words detector

27 / 40

Page 28: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals: Other Strategies (Aside)

1. Hierarchical segmentation

Generate object proposals using multiple hierarchicalsegmentations. Perform segmentations in colour spaces withdifferent sensitivities to shadows, shading, and highlights.

Figure credit: K.E.A. van de Sande

28 / 40

Page 29: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals: Other Strategies (Aside)

2. Seed growing

Solve several figure-ground separation problems from differentinitial seeds

Figure credit: Carreira and Sminchisescu, 2012

29 / 40

Page 30: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Object Proposals: Other Strategies (Aside)

3. Edges and image gradients

Score windows based on the number and strength of edgescontained within the window

Figure credit: Zitnick and Dollar, 2014

30 / 40

Page 31: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Convolutional Neural Networks (Preview)Recent years have seen increasing interest in applying neuralnetwork (‘deep learning’) techniques to solve visual recognitionproblems

Image source: Technology Review, 2013

31 / 40

Page 32: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

A Neuron

Figure credit: Fei-Fei and Karpathy

The basic unit of computation in a neural network is a neuron.A neuron accepts some number of input signals, computes theirweighted sum, and applies an activation function (or non-linearity)to the sum.Common activation functions include sigmoid and rectified linearunit (ReLU)

32 / 40

Page 33: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Sigmoid

f (x) = 1/(1 + e−x)

Figure credit: Fei-Fei and Karpathy

Common in many early neural networks

Biological analogy to saturated firing rate of neurons

Maps the input to the range [0,1]

33 / 40

Page 34: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

ReLU

f (x) = max(0, x)

Figure credit: Fei-Fei and Karpathy

Found to accelerate convergence during learning

Used in the most recent neural networks

34 / 40

Page 35: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Neural NetworksA neural network comprises neurons connected in an acyclicgraphThe outputs of neurons can become inputs to other neuronsNeural networks typically contain multiple layers of neurons

Example of a neural network with three inputs, a single hidden layer offour neurons, and an output layer of two neurons

Figure credit: Fei-Fei and Karpathy

35 / 40

Page 36: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Clicker Quiz (1 question)

Please turn on clickers

36 / 40

Page 37: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Neural Networks

Training a neural network requires estimating a large number ofparametersModern convolutional neural networks contain 10-20 layers andon the order of 100 million parameters

39 / 40

Page 38: CPSC 425: Computer Vision › ~ftung › cpsc425 › lecture22.pdf · Learning the model can be tricky. Why? A class model can consist of multiple component models ... 21/40. Object

Summary

Detection scores in the deformable part model are based on bothappearance and location

The deformable part model is trained iteratively by alternating thesteps

1 Assume components and part locations given; computeappearance and offset models

2 Assume appearance and offset models given; computecomponents and part locations

An object proposal algorithm generates a short list of regions withgeneric object-like properties that can be evaluated by an objectdetector in place of an exhaustive sliding window search

40 / 40