instructor: mircea nicolescu lecture 17 cs 485 / 685 computer vision

Instructor: Mircea Nicolescu

Lecture 17

CS 485 / 685

Computer Vision

Object Recognition Using SIFT Features

1.Match individual SIFT features from an image to a database of SIFT features from known objects (i.e., find nearest neighbors)

2. Find clusters of SIFT features belonging to a single object (hypothesis generation)

3. Estimate object pose (i.e., recover the transformation that the model has undergone) using at least three matches

4. Verify that additional features agree on object pose

Object Recognition Using SIFT Features

Nearest Neighbor Search

• Linear search: too slow for large database • kD trees: become slow when k > 10

Nearest Neighbor Search

• Approximate nearest neighbor search:− Best-bin-first [Beis et al. 97] (modification to kD-tree

algorithm)− Examine only the N closest bins of the kD-tree− Use a heap to identify bins in order by their distance

from query.

• Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time.

Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", International Conference on Computer Vision Theory and Applications, 2009.

FLANN - Fast Library for Approximate Nearest Neighbors

Estimate Object Pose

• Now, given feature matches…− Find clusters of features corresponding to a single

object− Solve for transformation (e.g., affine transformation)

• Need to consider clusters of size >=3• How do we find three “good” (true) matches?

• Pose clustering − Each feature is associated with four parameters:

− For every model-scene match (mi, sj), estimate a similarity transformation between mi and sj

(2D location, scale, orientation)

(tx, ty, s, θ)

− Transformation space is 4D: (tx, ty, s, θ)

(tx,ty,s,θ) (t’x,t’y,s’,θ’) ….

− Partial voting: vote for neighboring bins as well, and

use large bin size to better tolerate errors

− Transformations that accumulate at least three votes are selected (hypothesis generation)

− Using model-scene matches, compute object pose (i.e., affine transformation) and apply verification

Verification

• Back-project model on the scene and look for additional matches.

• Discard outliers (incorrect matches) by imposing stricter matching constraints (e.g., half error).

• Find additional matches by refining the transformation computed (i.e., iterative affine refinements).

Verification

• Evaluate probability that match is correct.

− Use a Bayesian (probabilistic) model, to estimate the probability that a model is present based on the actual number of matching features.

− Bayesian model takes into account: − Object size in image− Textured regions− Model feature count in database− Accuracy of fit

Lowe, D.G. 2001. Local feature view clustering for 3D object recognition. IEEE Conference on Computer Vision and Pattern Recognition, Kauai, Hawaii, pp. 682–688.

Planar Recognition

• Training images (models)

• Reliably recognized at a rotation of 60° away from the camera.

• Affine fit approximates perspective projection.

• Only 3 points are needed for recognition.

Planar Recognition

3D Object Recognition

• Training images

• Only 3 keypoints are needed for recognition; extra keypoints provide robustness.

• Affine model is no longer as accurate.

3D Object Recognition

Recognition Under Occlusion

Illumination Invariance

Object Categorization

Bag-of-Features (BoF) Models

Good for object categorization

Origin 1: Texture Recognition

• Texture is characterized by the repetition of basic elements or textons.

• Many times, it is the identity of the textons, not their spatial arrangement, that matters.

Universal texton dictionary

histogram

Universal texton dictionary

histogram

Origin 1: Texture Recognition

histogram

universal texton dictionary

Origin 2: Document Retrieval

• Orderless document representation:

frequencies of words from a dictionary Salton & McGill (1983)

BoF for Object Categorization

G. Csurka et al., "Visual Categorization with Bags of Keypoints", European Conference on Computer Vision, Czech Republic, 2004.

Need a “visual” dictionary!

BoF: Main Steps

Characterize objects in terms of parts or local features

BoF: main steps

Step 1: Feature extraction (e.g., SIFT features)

BoF: main steps (cont’d)

Step 2: Learn “visual” vocabulary

“visual” vocabulary

Feature extraction & clustering

BoF: Main Steps

…Features

Clustering

BoF: Main Steps

Clustering

“Visual” vocabulary: cluster centers

BoF: Main Steps

Example: K-Means Clustering

Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:

− Assign each data point to the nearest center.− Re-compute each cluster center as the mean of all

points assigned to it.

BoF: Main Steps

Step 3: Quantize features using “visual” vocabulary

(i.e., represent each feature by the closest cluster center)

Step 4: Represent images by frequencies of “visual words” (i.e., bags of features)

BoF: Main Steps

BoF Object Categorization

• How do we use BoF for object categorization?

• Nearest Neighbor (NN) Classifier

• K-Nearest Neighbor (KNN) Classifier

Find the k closest points from training data.

Labels of the k points “vote” to classify.

Works well provided there is lots of data and the distance function is good.

• Functions for comparing histograms

• SVM classifier

Example

Dictionary quality and size are very important parameters!

Example

Appearance-Based Recognition

• Represent an object by the set of its possible appearances (i.e., under all possible viewpoints and illumination conditions).

• Identifying an object implies finding the closest stored image.

• In practice, a subset of all possible appearances is used.

• Images are highly correlated, so “compress” them into a low-dimensional space that captures key appearance characteristics (e.g., use Principal Component Analysis (PCA)).

M. Turk and A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71-86, 1991.

H. Murase and S. Nayar, Visual Learning and Recognition of 3D Objects from Appearance, International Journal of Computer Vision, vol 14, pp. 5-24, 1995.

Appearance-Based Recognition

− The goal of segmentation is to partition an image into regions (e.g., separate objects from background)

− The results of segmentation are very important in determining the eventual success or failure of image analysis

− Segmentation is a very difficult problem in general !!

Image Segmentation• Goals and Difficulties

− Introduce enough knowledge about the application domain− Assume control over the environment (e.g., in industrial

applications)− Select type of sensors to enhance the objects of interest (e.g., use

infrared imaging for target recognition applications)

Image Segmentation

• Increasing accuracy and robustness

− Edge-based approaches:− Use the boundaries of regions to segment the image− Detect abrupt changes in intensity (discontinuities)

Image Segmentation• Segmentation approaches

− Region-based approaches:− Use similarity among pixels to find different regions

− Theoretically, both approaches should give identical results but this is not true in practice

Region Detection• A region is a group of connected pixels with similar

properties.• Region-based approaches use similarity and spatial

proximity among pixels to find different regions.• The goal is to divide the image into regions, so that:

− each region is homogeneous in some sense− adjacent regions are not homogeneous if taken together, in the

same sense.

Region Detection• Properties for region-based segmentation

− Partition an image R into sub-regions R1, R2,..., Rn

− Assume P(Ri) is a logical predicate – a property that pixel values of region Ri satisfy (e.g., intensity between 100 and 120).

− The following properties must be true:

Region Detection

• Main approaches for region detection

− Thresholding (pixel classification)− Region growing (splitting and merging)− Relaxation

Thresholding

• The simplest approach to image segmentation is by thresholding:

if f(x,y) < T then f(x,y) = 0 else f(x,y) = 255

Thresholding

• Automatic thresholding

− To make segmentation more robust, the threshold should be automatically selected by the system.

− Knowledge about the objects, the application, the environment should be used to choose the threshold automatically:

− Intensity characteristics of the objects− Sizes of the objects− Fractions of an image occupied by the objects− Number of different types of objects appearing in an image

Thresholding

• Choosing the threshold using the image histogram

− Regions with uniform intensity give rise to strong peaks in the histogram

− Multilevel thresholding is also possible

− In general, good thresholds can be selected if the histogram peaks are tall, narrow, symmetric, and separated by deep valleys.

instructor: mircea nicolescu lecture 17 cs 485 / 685 computer vision

object size

verification estimate

d object recognition

vote estimate object

compute object

computer vision slide

modelscene matches

affine transformation

Documents

nicolescu adriana

vision-based hand pose estimation: a review -...

transdisciplinarity: basarab nicolescu talks with russ...

stereo vision john morris these slides were adapted from a...

curriculum vitae europass loan mircea...

685 explorer 685 685 - cruisecraft.com.au

mircea pascu geometry

instructor: mircea nicolescu lecture 15 cs 485 / 685...

eliade, mircea - zalmoxis

5984744 basarab nicolescu gurdjieffs phiosophy of nature

globeall: panoramic video for an intelligent...

transdisciplinarity: basarab nicolescu talks with russ...

mircea eliade - nostalgia originilor

art-14-nicolescu 159-168

burca mircea

mircea eliade

appendix – mircea eliade: preamble to the hermeneutics of...

instructor: mircea nicolescu lecture 13 cs 485 / 685...

mircea radu 5

· dogÄru, mircea history of the mihail zahariade...