instructor: mircea nicolescu lecture 17 cs 485 / 685 computer vision

52
Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Upload: linda-logan

Post on 17-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Instructor: Mircea Nicolescu

Lecture 17

CS 485 / 685

Computer Vision

Page 2: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Object Recognition Using SIFT Features

1.Match individual SIFT features from an image to a database of SIFT features from known objects (i.e., find nearest neighbors)

2. Find clusters of SIFT features belonging to a single object (hypothesis generation)

2

Page 3: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

3. Estimate object pose (i.e., recover the transformation that the model has undergone) using at least three matches

4. Verify that additional features agree on object pose

Object Recognition Using SIFT Features

3

Page 4: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Nearest Neighbor Search

• Linear search: too slow for large database • kD trees: become slow when k > 10

4

Page 5: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Nearest Neighbor Search

• Approximate nearest neighbor search:− Best-bin-first [Beis et al. 97] (modification to kD-tree

algorithm)− Examine only the N closest bins of the kD-tree− Use a heap to identify bins in order by their distance

from query.

• Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time.

Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", International Conference on Computer Vision Theory and Applications, 2009.

FLANN - Fast Library for Approximate Nearest Neighbors

5

Page 6: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Estimate Object Pose

• Now, given feature matches…− Find clusters of features corresponding to a single

object− Solve for transformation (e.g., affine transformation)

6

Page 7: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

• Need to consider clusters of size >=3• How do we find three “good” (true) matches?

Estimate Object Pose

7

Page 8: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

• Pose clustering − Each feature is associated with four parameters:

− For every model-scene match (mi, sj), estimate a similarity transformation between mi and sj

(2D location, scale, orientation)

(tx, ty, s, θ)

vote

Estimate Object Pose

8

Page 9: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

− Transformation space is 4D: (tx, ty, s, θ)

(tx,ty,s,θ) (t’x,t’y,s’,θ’) ….

votes

Estimate Object Pose

9

Page 10: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

− Partial voting: vote for neighboring bins as well, and

use large bin size to better tolerate errors

− Transformations that accumulate at least three votes are selected (hypothesis generation)

− Using model-scene matches, compute object pose (i.e., affine transformation) and apply verification

Estimate Object Pose

10

Page 11: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Verification

• Back-project model on the scene and look for additional matches.

• Discard outliers (incorrect matches) by imposing stricter matching constraints (e.g., half error).

• Find additional matches by refining the transformation computed (i.e., iterative affine refinements).

11

Page 12: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Verification

• Evaluate probability that match is correct.

− Use a Bayesian (probabilistic) model, to estimate the probability that a model is present based on the actual number of matching features.

− Bayesian model takes into account: − Object size in image− Textured regions− Model feature count in database− Accuracy of fit

Lowe, D.G. 2001. Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688.

12

Page 13: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Planar Recognition

• Training images (models)

13

Page 14: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

• Reliably recognized at a rotation of 60° away from the camera.

• Affine fit approximates perspective projection.

• Only 3 points are needed for recognition.

Planar Recognition

14

Page 15: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

3D Object Recognition

• Training images

15

Page 16: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

• Only 3 keypoints are needed for recognition; extra keypoints provide robustness.

• Affine model is no longer as accurate.

3D Object Recognition

16

Page 17: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Recognition Under Occlusion

17

Page 18: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Illumination Invariance

18

Page 19: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Object Categorization

19

Page 20: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Bag-of-Features (BoF) Models

Good for object categorization

20

Page 21: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Origin 1: Texture Recognition

• Texture is characterized by the repetition of basic elements or textons.

• Many times, it is the identity of the textons, not their spatial arrangement, that matters.

21

Page 22: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Universal texton dictionary

histogram

Universal texton dictionary

histogram

Origin 1: Texture Recognition

histogram

universal texton dictionary

22

Page 23: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Origin 2: Document Retrieval

• Orderless document representation:

frequencies of words from a dictionary Salton & McGill (1983)

23

Page 24: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF for Object Categorization

G. Csurka et al., "Visual Categorization with Bags of Keypoints", European Conference on Computer Vision, Czech Republic, 2004.

Need a “visual” dictionary!

24

Page 25: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF: Main Steps

Characterize objects in terms of parts or local features

25

Page 26: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF: main steps

Step 1: Feature extraction (e.g., SIFT features)

26

Page 27: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF: main steps (cont’d)

Step 2: Learn “visual” vocabulary

“visual” vocabulary

Feature extraction & clustering

27

Page 28: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF: Main Steps

…Features

28

Page 29: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Clustering

BoF: Main Steps

29

Page 30: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Clustering

“Visual” vocabulary: cluster centers

BoF: Main Steps

30

Page 31: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Example: K-Means Clustering

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

− Assign each data point to the nearest center.− Re-compute each cluster center as the mean of all

points assigned to it.

31

Page 32: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF: Main Steps

Step 3: Quantize features using “visual” vocabulary

(i.e., represent each feature by the closest cluster center)

32

Page 33: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Step 4: Represent images by frequencies of “visual words” (i.e., bags of features)

BoF: Main Steps

33

Page 34: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF: Main Steps

34

Page 35: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF Object Categorization

• How do we use BoF for object categorization?

35

Page 36: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF Object Categorization

• Nearest Neighbor (NN) Classifier

36

Page 37: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF Object Categorization

• K-Nearest Neighbor (KNN) Classifier

Find the k closest points from training data.

Labels of the k points “vote” to classify.

Works well provided there is lots of data and the distance function is good.

37

Page 38: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF Object Categorization

• Functions for comparing histograms

38

Page 39: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

BoF Object Categorization

• SVM classifier

SVM

SVM

SVM

39

Page 40: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Example

40

Page 41: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Dictionary quality and size are very important parameters!

Example

41

Page 42: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

Appearance-Based Recognition

• Represent an object by the set of its possible appearances (i.e., under all possible viewpoints and illumination conditions).

• Identifying an object implies finding the closest stored image.

42

Page 43: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

• In practice, a subset of all possible appearances is used.

• Images are highly correlated, so “compress” them into a low-dimensional space that captures key appearance characteristics (e.g., use Principal Component Analysis (PCA)).

M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.

H. Murase and S. Nayar, Visual Learning and Recognition of 3D Objects from Appearance, International Journal of Computer Vision, vol 14, pp. 5-24, 1995.

Appearance-Based Recognition

43

Page 44: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

44

− The goal of segmentation is to partition an image into regions (e.g., separate objects from background)

− The results of segmentation are very important in determining the eventual success or failure of image analysis

− Segmentation is a very difficult problem in general !!

Image Segmentation• Goals and Difficulties

Page 45: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

45

− Introduce enough knowledge about the application domain− Assume control over the environment (e.g., in industrial

applications)− Select type of sensors to enhance the objects of interest (e.g., use

infrared imaging for target recognition applications)

Image Segmentation

• Increasing accuracy and robustness

Page 46: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

46

− Edge-based approaches:− Use the boundaries of regions to segment the image− Detect abrupt changes in intensity (discontinuities)

Image Segmentation• Segmentation approaches

− Region-based approaches:− Use similarity among pixels to find different regions

− Theoretically, both approaches should give identical results but this is not true in practice

Page 47: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

47

Region Detection• A region is a group of connected pixels with similar

properties.• Region-based approaches use similarity and spatial

proximity among pixels to find different regions.• The goal is to divide the image into regions, so that:

− each region is homogeneous in some sense− adjacent regions are not homogeneous if taken together, in the

same sense.

Page 48: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

48

Region Detection• Properties for region-based segmentation

− Partition an image R into sub-regions R1, R2,..., Rn

− Assume P(Ri) is a logical predicate – a property that pixel values of region Ri satisfy (e.g., intensity between 100 and 120).

− The following properties must be true:

Page 49: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

49

Region Detection

• Main approaches for region detection

− Thresholding (pixel classification)− Region growing (splitting and merging)− Relaxation

Page 50: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

50

Thresholding

• The simplest approach to image segmentation is by thresholding:

if f(x,y) < T then f(x,y) = 0 else f(x,y) = 255

Page 51: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

51

Thresholding

• Automatic thresholding

− To make segmentation more robust, the threshold should be automatically selected by the system.

− Knowledge about the objects, the application, the environment should be used to choose the threshold automatically:

− Intensity characteristics of the objects− Sizes of the objects− Fractions of an image occupied by the objects− Number of different types of objects appearing in an image

Page 52: Instructor: Mircea Nicolescu Lecture 17 CS 485 / 685 Computer Vision

52

Thresholding

• Choosing the threshold using the image histogram

− Regions with uniform intensity give rise to strong peaks in the histogram

− Multilevel thresholding is also possible

− In general, good thresholds can be selected if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.