object recognition using local descriptors
DESCRIPTION
Object Recognition using Local Descriptors. Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile. Outline. Motivation & Recognition Examples Dimensionality problems Object Recognition using Local Descriptors Matching & Storage of Local Descriptors - PowerPoint PPT PresentationTRANSCRIPT
Object Recognition using Local Descriptors
Javier Ruiz-del-Solar, and Patricio LoncomillaCenter for Web Research
Universidad de Chile
Outline
•Motivation & Recognition Examples•Dimensionality problems•Object Recognition using Local Descriptors•Matching & Storage of Local Descriptors•Conclusions
Motivation• Object recognition approaches based on local invariant
descriptors (features) have become increasingly popular and have experienced an impressive development in the last years.
• Invariance against: scale, in-plane rotation, partial occlusion, partial distortion, partial change of point of view.
• The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the
observed scene are computed. 2. these descriptors are matched against descriptors of
object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.
Recognition Examples (1/2)
Recognition Examples (2/2)
Image Matching Examples (1/2)
Image Matching Examples (2/2)
Some applications• Object retrieval in multimedia databases (e.g. Web)• Image retrieval by similarity in multimedia databases• Robot self-localization• Binocular vision• Image alignment and matching• Movement compensation• …
However … there are some problems
Dimensionality problems• A given image can produce ~100-1,000 descriptors of
128 components (real values)• The model database can contain until 1,000-10,000
objects in some special applications• => large number of comparisons => large processing
time• => large database’s size
Main motivation of this talk:• To get some ideas about how to make efficient
comparisons between local descriptors as well as efficient storage of them …
Recognition Process
The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the
observed scene are computed. 2. these descriptors are matched against descriptors of
object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Affine Transform Calculation
SIFT Matching
SIFT Database
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Reference Image
Offline Database Creation
Input Image
Affine Transform Parameters
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Affine Transform Calculation
SIFT Matching
SIFT Database
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Input Image
Affine Transform Parameters
Reference Image
Offline Database Creation
Interest Points Detection (1/2) Interests points correspond to maxima of the SDoG
(Subsampled Difference of Gaussians) Scale-Space (x,y,).
Scale Space
SDoG
Ref: Lowe 1999
Interest Points Detection (2/2)
Examples of detected interest points.
Our improvement: Subpixel location of interest points by a 3D quadratic approximation around the detected interest point in the scale-space.
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Affine Transform Calculation
SIFT Matching
SIFT Database
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Input Image
Affine Transform Parameters
Reference Image
Offline Database Creation
SIFT Calculation For each obtained keypoint, a descriptor or feature vector
that considers the gradient values around the keypoint is computed. This descriptors are called SIFT (Scale -Invariant Feature Transformation).
SIFTs allow obtaining invariance against to scale and orientation.
Ref: Lowe 2004
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Affine Transform Calculation
SIFT Matching
SIFT Database
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Input Image
Affine Transform Parameters
Reference Image
Offline Database Creation
SIFT Matching
Euclidian distance between the SIFTs (vectors) is employed.
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Affine Transform Calculation
SIFT Matching
SIFT Database
Interest Points
Detection
Scale Invariant Descriptors (SIFT)
Calculation
Input Image
Affine Transform Parameters
Reference Image
Offline Database Creation
Affine Transform Calculation (1/2)Several stages are employed:1. Object Pose Prediction• In the pose space a Hough transform is employed for
obtaining a coarse prediction of the object pose, by using each matched keypoint for voting for all object pose that are consistent with the keypoint.
• A candidate object pose is obtained if at least 3 entries are found in a Hough bin.
2. Affine Transformation Calculation• A least-squares procedure is employed for finding an affine
transformation that correctly account for each obtained pose.
uv
m1 m2
m3 m4
xy
txty
Affine Transform Calculation (2/2)3. Affine Transformation Verification Stages:• Verification using a probabilistic model (Bayes classifier).• Verification based on Geometrical Distortion• Verification based on Spatial Correlation• Verification based on Graphical Correlation• Verification based on the Object Rotation4. Transformations Merging based on Geometrical
Overlapping
In blue verification stages proposed by us for improving the detection of robots heads.
Input Image
Reference Images
AIBO Head Pose Detection Example
Matching & Storage of Local Descriptors
• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-
components vector.• All the (keypoint,vector) pairs corresponding to a set of
reference images are stored in a set T.
Reference image
x,y,n,v1
v2
...v128
x,y,n,v1
v2
...v128
x,y,n,v1
v2
...v128
x,y,n,v1
v2
...v128 ...
(1) (2) (3) (4)
= T
Reference image
p1
d1
p2
d2
p3
d3
p4
d4
...
= T
Matching & Storage of Local Descriptors
More compact notation
• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-
components vector.• All the (keypoint,vector) pairs corresponding to a set of
reference images are stored in a set T.
• In the matching-generation stage, an input image gives another set of keypoints and vectors.
• For each input descriptor, the first and second nearest descriptors in T must be found.
• Then, a pair of nearest descriptors (d,dFIRST) gives a pair of matched keypoints (p,pFIRST).
Matching & Storage of Local Descriptors
Input image
pd
...
Search in T
pFIRST
dFIRST
pSEC
dSECp1
d1
p2
d2 ...
• The match is accepted if the ratio between the distance to the first nearest descriptor and the distance to the second nearest descriptor is lower than a given threshold
• This indicates that exists no possible confusion in the search results.
Matching & Storage of Local Descriptors
Accepted if:
distance( , ) < * distance ( , )d dFIRST d dSEC
• A way to store the T set in a ordered way is using a kd-tree• In this case, we will use a 128d-tree• As well known, in a kd-tree the elements are stored in the
leaves. The other nodes are divisions of the space in some dimension.
Storage: Kd-trees
1>2
2>3 2>5
13
27
65
89
All the vectors with more than 2 in the first dimension, stored at right side
Division node
Storage node
• Generation of balanced kd-trees:• We have a set of vectors
• We calculate the means and variances for each dimension i.
Storage: Kd-trees
a1
a2
…
b1
b2
…
c1
c2
…
d1
d2
…
………
M i 1Nai bi c i ... ,Vi
1N
ai M i 2 (bi M i)2 ...
Tree construction:• Select the dimension iMAX with the largest variance
• Order the vectors with respect to the iMAX dimension.• Select the median M in this dimension.• Get a division node.• Repeat the process in a recursive way.
Storage: Kd-trees
iMAX>M
Nodes with iMAX component lesser than M
Nodes with iMAX component greater than M
Search process of the nearest neighbors, two alternatives:• Compare almost all the descriptors in T with the given
descriptor and return the nearest one, or• Compare Q nodes at most, and return the nearest of
them (compare calculate Euclidean distance)• Requires a good search strategy• It can fail• The failure probability is controllable by Q
We choose the second option and we use the BBF (Best Bin First) algorithm.
Search Process
• Set:• v: query vector• Q: priority queue ordered by distance to v (initially void)• r: initially is the root of T• vFIRST: initially not defined and with an infinite distance to v• ncomp: number of comparisons, initially zero.
• While (!finish):• Make a search for v in T from r => arrive to a leaf c• Add all the directions not taken during the search to Q in an
ordered way (each division node in the path gives one not-taken direction)
• If c is more near to v than vFIRST, then vFIRST=c• Make r = the first node in Q (the more near to v), ncomp++• If distance(r,v) > distance(vFIRST,v), finish=1
• If ncomp > ncompMAX, finish=1
Search Process: BBF Algorithm
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue:
CMIN:
Distance between 2 and 20
Search ExampleRequested vector
1>2
18
•I am a pointer•20>2
•Go right
Not-taken option
18
Search Example1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
CMIN:
queue: 1>22>7
181
•8>7•Go right
18
1
comparisons: 0
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue: 1>22>7
18
1>8
CMIN:
Search Example
1 14
•20>6•Go right
18
1
14
comparisons: 0
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue: 1>22>7
181
1>8
14
CMIN: 91000992
•We arrived to a leaf
•Store nearest leaf in CMIN
Search Example
992
14
18
1
comparisons: 1
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue: 1>22>7
181
1>8
14
CMIN: 91000992
•Distance from best-in-queue is
lesser than distance from
cMIN
•Start new search from
best in queue•Delete best
node in queue
Search Example
992
14
18
1
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue: 1>2
18
1>8
12
CMIN: 91000992
•Go down from here
Search Example
992
14
18
1
comparisons: 1
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue: 1>2
18
1>8
12
CMIN:
•We arrived to a leaf
•Store nearest leaf in CMIN
992
14
18
1
1
2071
Search Example
comparisons: 2
1>2
2>3
13
27
207
2>7
1>6
51500
91000
208?:
queue: 1>2
18
1>8
12
CMIN:
992
14
18
1
1
2071
Search Example
•Distance from best-in-queue is
NOT lesser than distance
from cMIN
•Finish
comparisons: 2
Conclusions• BBF+Kd-trees: good trade off between short search time
and high success probability.• But, perhaps BBF+ Kd-trees is not the optimal solution.• Finding a better methodology is very important to massive
applications (as an example, for Web image retrieval)