Department of Electrical & Computer Engineering
Object Recognition from Local
Scale-Invariant Features
David G. Lowe
Presented by: Yan Fang
Department of Electrical & Computer Engineering
What are Local Features?• Image Pattern differs from its immediate neighborhood
• Associate with change of image property
• Used for image descriptors
Department of Electrical & Computer Engineering
Why Local Features?
• Specific semantic interpretation in the limited context of a
certain application.- Edge -> road
- Blob -> impurities
• Limited set of well localized and individually identifiable anchor
points.- Motion tracking
- 3D reconstruction
- Image alignment or mosaicing
• A robust image representation, no need for segmentation- Scene recognition
- Object recognition
- Needs no meaning in features
Department of Electrical & Computer Engineering
Some Terms
• Detector - Tools that extract features from images
• Descriptor – Instance of feature representation
• Invariant - A function is invariant under a certain family of
transformations if its value does not change when a
transformation from this family is applied to its argument.
• Local feature – (ideal) location in space, no spatial extent- interest points
- Regions
- edge segments
Department of Electrical & Computer Engineering
Ideal Local Features
• Repeatability: found in different condition images of same object or
scene.
- Invariance
- Robustness
• Distinctiveness/informativeness: The intensity patterns underlying
the detected features should show a lot of variation.
• Locality: reduce the probability of occlusion and to allow simple
model approximations of the geometric and photometric
deformations.
• Quantity: sufficiently large number, even on small objects.
• Accuracy: The detected features should be accurately localized,
both in image location, as with respect to scale and possibly shape.
• Efficiency: time, fast, easy for computation
Department of Electrical & Computer Engineering
Discussion on Local Features
• Repeatability: depends on invariance, robustness, quantity.
• Distinctiveness v.s. Locality
- More local, less information, harder to match
- In some case (mosaicing), locality can be scarified
• Distinctiveness v.s. Invariance
- Degree of freedom of transformation
• Distinctiveness v.s. Robustness
- Information Loss for robustness
- Denoise v.s. Detail
Department of Electrical & Computer Engineering
Compare with other features
• Global Features - Describe content with color histogram
- Usage: Segmentation, Object recognition
- Fail in distinguishing foreground and background
- Image clutter and occlusion are problems
• Image Segments - Difficult by itself, require much information from image
- Search for blob, based on texture/color
• Sampled Features - Exhaustively Sample from subparts with sliding window
- Solve background problem, not partial occlusion
- Fixed grid sampling, difficult for invariance.
- Random sampling, better localization, poor repeatability, not
used alone
- Sampling from edge, good with wiry objects
•
Department of Electrical & Computer Engineering
Corner Detector – Harris Detector
• Distinguish “flat”, “Edge”, “Corner”
• Auto-Correlation Matrix, describe gradient distribution of local
neighborhood- Smooth with Gaussian kernel
- Two eigenvalues indicate image signal change in two direction
- Large eigenvalue in both means potential corner
• Measure the cornerness
Department of Electrical & Computer Engineering
Corner Detector – Harris Detector
For interest points detection, extract local minimum of cornerness
function with non-maximum suppressions
Department of Electrical & Computer Engineering
Example of Harris Detector
Results on rotated image examples
Notice T-junctions also be found other than true corners
Department of Electrical & Computer Engineering
Select Feature Detector
• Select feature detectors based on image content
and category
• Do not use more invariance than need. Notice the
tradeoff between invariance and distinguish power.
• Consider other properties depend on application
scenario- localization accuracy for Camera Calibration or 3D modeling
- Efficiency for large dataset
Department of Electrical & Computer Engineering
Introduction to SIFT
• Problem: Object Recognition in cluttered real world scene
• Challenge & Difficulty: Finding image features resist to object
variation
• Proposed Method: Scale Invariant Feature Transform(SIFT)
Department of Electrical & Computer Engineering
Invariance
• Illumination
• Scale
• Rotation
• Affine
Department of Electrical & Computer Engineering
Previous Work
• Candidate feature types
– line segments
– groupings of edges
– regions
• Zhang et al
– Harris Corner Detection
– Detect peaks in local image variation
• Schmid and Mohr
– Harris Corner Detection for interesting points
– Orientation-invariant vector of derivative-of-Gaussian image measurements
Department of Electrical & Computer Engineering
Motivation & Improvement
Limitation of related work:• Examine image only on a single scale
• Difficult to extend to other circumstance
• Focus on feature detection, overlook the descriptor
This work:
• Identify key location in scale-space
• Selected feature vectors invariant to scaling,
stretching, rotation and other variation
• Improvement on feature descriptor
• Efficient, less than 2 second with clutter and
occlusion
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion
• Feature Detection
• Local Image Description
• Indexing and Matching
• Model Verification
Department of Electrical & Computer Engineering
Scale Space
Proper scaling of
objects in new image
is unknown
Exploring features in
different scales is
helpful to recognize
different objects.
Department of Electrical & Computer Engineering
Difference of Gaussian (DoG)
• A = Convolve image with vertical and horizontal 1D
Gaussians, 𝜎 = 2• B = Convolve A with vertical and horizontal 1D
Gaussians, 𝜎 = 2• DOG (Difference of Gaussian) = A – B
• Downsample B with bilinear interpolation with pixel
spacing of 1.5 (linear combination of 4 adjacent
pixels)
𝐷 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝑘𝜎 − 𝐺 𝑥, 𝑦, 𝜎 ∗ 𝐼 𝑥, 𝑦 , 𝑘 = 2
Department of Electrical & Computer Engineering
Image Pyramid of DoG
A3-B3
A2-B2
A1-B1
B3
A3
B2
A2
B1
A1
G
G
G
G
Downsample
Downsample
DOG Pyramid1DOG1
Department of Electrical & Computer Engineering
Pyramid of DoG (Octave)
2k2σ
2kσ
2σ
kσ
σ
2kσ
2σ
kσ
σ
David G. Lowe, IJCV 2004
Department of Electrical & Computer Engineering
DoG Example
A1 B1 DoG1
DoG3
DoG2A2
A3 B3
B2
Ashley L. Kapron
Department of Electrical & Computer Engineering
Feature Detection
• Find maxima and minima of scale space
• For each point on a DOG level:
– Compare to 26 neighbors at adjacent level
• Repeat for each DOG level
• Key points remains
David G. Lowe, IJCV 2004
Department of Electrical & Computer Engineering
SIFT key stability - Illumination
• For all levels, compute
– Gradient Magnitude
– 𝑀𝑖𝑗 = (𝐴𝑖𝑗 − 𝐴𝑖+1,𝑗)2+(𝐴𝑖𝑗 − 𝐴𝑖,𝑗+1)
2
• Threshold gradient magnitudes:
– Remove all key points with MIJ less than 0.1 times the max gradient value
• Motivation: Low contrast is generally less reliable than high for feature points
Department of Electrical & Computer Engineering
SIFT key stability - Orientation
• For all levels, compute
– Gradient Orientation
– 𝑅𝑖𝑗 = 𝑎𝑡𝑎𝑛2(𝐴𝑖𝑗 − 𝐴𝑖−1,𝑗 , 𝐴𝑖𝑗+1 − 𝐴𝑖,𝑗)
+
Gaussian Smoothed Image Gradient Orientation Gradient Magnitude
Ashley L. Kapron
Department of Electrical & Computer Engineering
SIFT key stability - Orientation
• Gradient magnitude weighted by 2D gaussian
Gradient Magnitude 2D Gaussian Weighted Magnitude
* =
Ashley L. Kapron
Department of Electrical & Computer Engineering
SIFT key stability - Orientation
• Identify peak
• Assign orientation and sum of magnitude to key point
Weighted Magnitude
Gradient Orientation
Gradient OrientationSum
of
Weig
hte
d M
agnitudes
Peak
Ashley L. Kapron
Department of Electrical & Computer Engineering
Example of Key Points
Max/mins from
DOG pyramid
Filter for
illuminationFilter for edge
orientation
Ashley L. Kapron
Department of Electrical & Computer Engineering
Stability Test
78% of the keys survive from
rotation, scaling, stretching,
change of brightness and
contrast,
and addition of pixel noise.
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion
• Feature Detection
• Local Image Description
• Indexing and Matching
• Model Verification
Department of Electrical & Computer Engineering
Local Image Description
• SIFT keys each assigned:
– Location
– Scale (analogous to level it was detected)
– Orientation (assigned in previous canonical
orientation steps)
• Now: Describe local image region invariant to
the above transformations
Department of Electrical & Computer Engineering
SIFT Key Example
Department of Electrical & Computer Engineering
Local Image Description
For each key point:
• Identify 8x8
neighborhood
(from DOG
level it was
detected)
• Align orientation
to x-axis
(subtracted by
the orientation
of key points)
Department of Electrical & Computer Engineering
Local Image Description
• Calculate gradient magnitude and orientation map and weight by
Gaussian
Department of Electrical & Computer Engineering
Local Image Description
• Calculate gradient magnitude and orientation map and weight by
Gaussian
• Sum the weighted gradient magnitude at near direction. Calculate
histogram of each 4x4 region. 8 bins for gradient orientation.
Department of Electrical & Computer Engineering
Local Image Description
• Calculate gradient magnitude and orientation map and weight by
Gaussian
• Sum the weighted gradient magnitude at near direction.Calculate
histogram of each 4x4 region. 8 bins for gradient orientation.
• This histogram array is the image descriptor.
Ashley L. Kapron
Department of Electrical & Computer Engineering
Orientations Numbers
David G. Lowe, IJCV 2004
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion
• Feature Detection
• Local Image Description
• Indexing and Matching
• Model Verification
Department of Electrical & Computer Engineering
Image Matching
Database Input Image
Department of Electrical & Computer Engineering
Image Matching
• Find all key points identified in target image
– Each key point will have 2D location, scale and orientation, as
well as invariant descriptor vector
• For each key point, search similar descriptor vectors in
reference image database.
– Descriptor vector may match more than one reference pose
database
– The key point “votes” for pose(s)
• Use best-bin-first algorithm
Department of Electrical & Computer Engineering
Hough Transform Clustering
• Create 4D Hough Transform (HT) Space for each
reference pose
1. Orientation bin = 30°
2. Scale bin = 2
3. X location bin = 0.25*ref image width
4. Y location bin = 0.25*ref image height
• If key point “votes” for reference pose, count the vote
which gives estimate of location and pose
• Keep list of which key points vote for a bin
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion
• Feature Detection
• Local Image Description
• Indexing and Matching
• Model Verification
Department of Electrical & Computer Engineering
Verification
• Identify bins with largest votes (must have at least 3).
• Using list of key points which voted for a cell, compute
affine transformation parameters (M, T)
• Use corresponding coordinates of reference model (x,y)
and target image (u,v).
• If more than three points, solve in least-squares sense
Department of Electrical & Computer Engineering
Remove Outliers
• After applying affine transformation to key points, determine difference between calculated location and actual target image location
• Candidate must meet:
– Orientation within 15°
– Scale changed within 2
– X,Y location within 0.2*model size
• Repeat least-squares solution until no points are removed
• Fewer than 3 points remain lead to rejection
Department of Electrical & Computer Engineering
Object Recognition Example
Department of Electrical & Computer Engineering
Object Recognition Example
Department of Electrical & Computer Engineering
Pros & Cons
• Numerous keys can be generated from scaling space for even small objects
• Partial occlusion/image clutter can be dealt with
• Object models can undergo limited affine projection.
• Individual features can be matched to a large database of objects
• Robust recognition can be performed fast
• Fully affine transformations require additional steps
• Method was not evaluated by large data set with various case.
Department of Electrical & Computer Engineering
Future Works
• Deeper exploration in scale space with octave of
incremental Gaussian filtering
• Sub-pixel localization with 3D curve fitting
• Filter edge and low contrast points
• More?
Department of Electrical & Computer Engineering
Questions?