lecture 10 - interest point detectors &...
TRANSCRIPT
Interest Point Detectors & RANSAC
Instructor - Simon Lucey
16-423 - Designing Computer Vision Apps
Today
• Fundamental Matrix
• Interest Point Detectors
• Matching Regions
• RANSAC.
• Rank 2:
• 5 degrees of freedom
• Non-linear constraints between elements
Reminder: Essential Matrix
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Reminder: Epipolar lines
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Recovering Epipolar Lines
Equation of a line:
or
or
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
T
Recovering Epipolar Lines
Equation of a line:
Now consider
This has the form where
So the epipolar lines are
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
T
T T
T
T
Essential matrix:
To recover translation and rotation use the matrix:
We take the SVD and then we set
Decomposition of E
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Four Interpretations
To get the different
solutions, we mutliply τ by -1 and substitute
The Fundamental MatrixNow consider two cameras that are not normalized
By a similar procedure to before, we get the relation
orwhere
Relation between essential and fundamental
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Estimation of the Fundamental Matrix
Estimation of Fundamental Matrix
When the fundamental matrix is correct, the epipolar line induced by a point in the first image should pass through the matching point in the second image and vice-versa.
This suggests the criterion
If and then
Unfortunately, there is no closed form solution for this quantity.
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
The 8 Point AlgorithmApproach:
• solve for fundamental matrix using homogeneous coordinates
• closed form solution (but to wrong problem!)• Known as the 8 point algorithm
Start with fundamental matrix relation
Writing out in full:
or
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
The 8 Point Algorithm
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Can be written as:
where
Stacking together constraints from at least 8 pairs of points, we get the system of equations
Minimum direction problem of the form Find minimum of subject to .
To solve, compute the SVD and then set to the last column of .
The 8 Point Algorithm
Adapted from: Computer vision: models, learning and inference. Simon J.D. Prince
Fitting Concerns•This procedure does not ensure that solution is rank 2. Solution: set last singular value to zero.
•Can be unreliable because of numerical problems to do with the data scaling – better to re-scale the data first
•Needs 8 points in general positions (cannot all be planar).
•Fails if there is not sufficient translation between the views
•Use this solution to start non-linear optimisation of true criterion (must ensure non-linear constraints obeyed).
•There is also a 7 point algorithm.
Today
• Fundamental Matrix
• Interest Point Detectors
• Matching Regions
• RANSAC.
I(x
,y)
I(x + 1, y + 1)
I
Simoncelli & Olshausen 2001
I(x
,y)
I
I(x + 8, y + 8) Simoncelli & Olshausen 2001
I(x
,y)
I
I(x + 16, y + 16) Simoncelli & Olshausen 2001
I(x
,y)
I
I(x + 50, y + 50) Simoncelli & Olshausen 2001
I(x
,y)
I
I(x + 50, y + 50) Simoncelli & Olshausen 2001
Pixel Coherence
• The pixel coherence assumption demonstrates that pixels within natural images are heavily correlated with one another within a local neighborhood (e.g., +/- 5 pixels).
• For example,
19
N
CS143 Intro to Computer VisionSept, 2007 ©Michael J. Black
Image Filtering
3 4 3
2 ? 5
5 4 2
3
What assumptions are you
making to infer the center value?
Pixel Coherence
• The pixel coherence assumption demonstrates that pixels within natural images are heavily correlated with one another within a local neighborhood (e.g., +/- 5 pixels).
• For example,
19
N
CS143 Intro to Computer VisionSept, 2007 ©Michael J. Black
Image Filtering
3 4 3
2 ? 5
5 4 2
3
What assumptions are you
making to infer the center value?
“Typically assume that pixels within +/- 3, 5 or 7 pixels are highly correlated.”
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C ',;A/2&;*/&;&D129*/)/<E&2;,-</D&
*/-*/2/();)1+(&+F&;&9+()1(.+.2&21A(;<
IGxH
x
Estimating Gradients
20
• Images are a discretely sampled representation of a continuous signal,
(Black)
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C ',;A/2&;*/&;&D129*/)/<E&2;,-</D&
*/-*/2/();)1+(&+F&;&9+()1(.+.2&21A(;<
IGxH
x
Estimating Gradients
21
• Images are a discretely sampled representation of a continuous signal,
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C ',;A/2&;*/&;&D129*/)/<E&2;,-</D&
*/-*/2/();)1+(&+F&;&9+()1(.+.2&21A(;<
IGxH
x
(Black)
• What if I want to know given that I have only the appearance at and ?I(x0 + �x)
Estimating Gradients
22
(Black)
�x
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C ',;A/2&;*/&;&D129*/)/<E&2;,-</D&
*/-*/2/();)1+(&+F&;&9+()1(.+.2&21A(;<
IGxH
x
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C:;)&1D&'&E;()&)+&@(+E&IFx0GdxH&D+*&2,;<<&
dx<#I
IFxH
xx0
I(x0)
I(x0 + �x)I(x0)
• Again, simply take the Taylor series approximation?
Estimating Gradients
23
(Black)
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C ',;A/2&;*/&;&D129*/)/<E&2;,-</D&
*/-*/2/();)1+(&+F&;&9+()1(.+.2&21A(;<
IGxH
x
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C:;)&1D&'&E;()&)+&@(+E&IFx0GdxH&D+*&2,;<<&
dx<#I
IFxH
xx0
I(x0 + �x)I(x0)
I(x0 + �x) � I(x0) +�I(x0)
�x
T
�x
• Again, simply take the Taylor series approximation?
Estimating Gradients
23
(Black)
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C ',;A/2&;*/&;&D129*/)/<E&2;,-</D&
*/-*/2/();)1+(&+F&;&9+()1(.+.2&21A(;<
IGxH
x
!"#$%&'()*+&)+&!+,-.)/*&0121+("/-)3&4556 7819:;/<&=>&?<;9@
',;A/2&;2&B.(9)1+(2
C:;)&1D&'&E;()&)+&@(+E&IFx0GdxH&D+*&2,;<<&
dx<#I
IFxH
xx0
I(x0 + �x)I(x0)
I(x0 + �x) � I(x0) +�I(x0)
�x
T
�x
�I(x0)�x
Interest Point Detectors
Interest Point Detectors
• Interest point detectors, in general, detect corners. • A dense approach (using all pixels) will be far too slow. • Matching lines suffers from the “aperture problem” (i.e. has
the line moved along itself?). • Corners are:-
• relatively sparse • well-localized • good for matching
• reasonably cheap to compute • appear quite robustly • useful for geometry.
A(�x,�y) =X
(xk,yk)2N (x,y)
||I(xk
, y
k
)� I(xk
+ �x, y
k
+ �y)||2
N
(�x,�y)
Interest Point Detectors
• Approach is based on concept of auto-correlation,
26
N (x)
Interest Point Detectors
• Approach is based on concept of auto-correlation,
27
“in vector form”
“local neighborhood”
A(�x) =X
xk2N (x)
||I(xk)� I(xk + �x)||2
A(�x)
5 10 15 20 25
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
Interest Point Detectors
• Which patches match the auto-correlation responses?
28
A(�x) =X
xk2N (x)
||I(xk)� I(xk + �x)||2
A(�x)
5 10 15 20 25
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
Interest Point Detectors
• Which patches match the auto-correlation responses?
28
A(�x) =X
xk2N (x)
||I(xk)� I(xk + �x)||2
Harris Detector
• Harris & Stephens (1988) proposed,
29
⇡X
i2N||I(xi)� I(xi)�
@I(xi)@x
T
�x||2
⇡ �xT H�x
H =X
i2N
@I(xi)@x
@I(xi)@x
T
Harris Detector
• How can one characterize these three situations?
A(�x) =X
xk2N (x)
||I(xk)� I(xk + �x)||2
• We can simplify this through a linear approximation,
• where,
Harris Corner Detector
Make decision based on image structure tensor
H =X
i2N
@I(xi)@x
@I(xi)@x
T
Harris Detector
32
Effects of Scale
• Consider a single row or column of the image – Plotting intensity as a function of position gives a signal
Source: S. Seitz
Effects of Scale
• Consider a single row or column of the image – Plotting intensity as a function of position gives a signal
• Where is the edge? Source: S. Seitz
Search Multiple Scales
f
Source: S. Seitz
Search Multiple Scales
f
g
Source: S. Seitz
Search Multiple Scales
f
g
f * g
Source: S. Seitz
Search Multiple Scales
f
g
f * g
Source: S. Seitz
Search Multiple Scales
• To find edges, look for peaks in
f
g
f * g
Source: S. Seitz
SIFT Detector
• Filter with difference of Gaussian filters at increasing scales• Build image stack (scale space)• Find extrema in this 3D volume
Coarse to Fine
36
Scale & Affine Invariant Interest Point Detectors 67
scale estimates the characteristic length of the corre-sponding image structures, in a similar manner as thenotion of characteristic length is used in physics. Theselected scale is characteristic in the quantitative sense,since it measures the scale at which there is maximumsimilarity between the feature detection operator andthe local image structures. This scale estimate will (fora given image operator) obey perfect scale invarianceunder rescaling of the image pattern.
Given a point in an image and a scale selection op-erator we compute the operator responses for a setof scales σn (Fig. 1). The characteristic scale corre-sponds to the local extremum of the responses. Notethat there might be several maxima or minima, thatis several characteristic scales corresponding to differ-ent local structures centered on this point. The char-acteristic scale is relatively independent of the imageresolution. It is related to the structure and not to theresolution at which the structure is represented. Theratio of the scales at which the extrema are found forcorresponding points is the actual scale factor betweenthe point neighborhoods. In Mikolajczyk and Schmid(2001) we compared several differential operators andwe noticed that the scale-adapted Harris measure rarelyattains maxima over scales in a scale-space representa-tion. If too few interest points are detected, the imagecontent is not reliably represented. Furthermore, theexperiments showed that Laplacian-of-Gaussians findsthe highest percentage of correct characteristic scales
Figure 1. Example of characteristic scales. The top row shows two images taken with different focal lengths. The bottom row shows theresponse Fnorm(x, σn) over scales where Fnorm is the normalized LoG (cf. Eq. (3)). The characteristic scales are 10.1 and 3.89 for the left andright image, respectively. The ratio of scales corresponds to the scale factor (2.5) between the two images. The radius of displayed regions inthe top row is equal to 3 times the characteristic scale.
to be found.
|LoG(x, σn)| = σ 2n |Lxx (x, σn) + L yy(x, σn)| (3)
When the size of the LoG kernel matches with thesize of a blob-like structure the response attains an ex-tremum. The LoG kernel can therefore be interpretedas a matching filter (Duda and Hart, 1973). The LoG iswell adapted to blob detection due to its circular sym-metry, but it also provides a good estimation of thecharacteristic scale for other local structures such ascorners, edges, ridges and multi-junctions. Many pre-vious results confirm the usefulness of the Laplacianfunction for scale selection (Chomat et al., 2000;Lindeberg, 1993, 1998; Lowe, 1999).
2.2. Harris-Laplace Detector
In the following we explain in detail our scale invariantfeature detection algorithm. The Harris-Laplace detec-tor uses the scale-adapted Harris function (Eq. (2)) tolocalize points in scale-space. It then selects the pointsfor which the Laplacian-of-Gaussian, Eq. (3), attainsa maximum over scale. We propose two algorithms.The first one is an iterative algorithm which detectssimultaneously the location and the scale of character-istic regions. The second one is a simplified algorithm,which is less accurate but more efficient.
SIFT Detector
Identified Corners Remove those on edges
Remove those where contrast
is low
Today
• Fundamental Matrix
• Interest Point Detectors
• Matching Regions
• RANSAC.
• Once we have detected local regions, we must then match them.
Matching Regions
39
image 1 image 2
SSD : sum of square difference
Small difference values ! similar patches
X
[�x,�y]T2N
||I1(x1 +�x, y1 +�y)� I2(x2 +�x, y2 +�y)||22
(x1, y1)(x2, y2)
Problems with SSD
• SSD measures are
sensitive to both,
• geometric and,
• photometric variation.
• Common practice to
use descriptors.
40
I1
I2
Planar Affine Patch Assumption
41
“View 1” “View 2”
• A number of technique have been proposed to detect affine covariant regions (Schmid et al. 2004).
Rotation Invariance
• Estimation of the dominant orientation – extract gradient orientation – histogram over gradient orientation – peak in this histogram
• Rotate patch in dominant direction
SIFT Descriptor
1. Compute image gradients 2. Pool into local histograms3. Concatenate histograms4. Normalize histograms
More on this in future lectures.
Why Pool?
Why Pool?
Why Pool?
“average”
Why Pool?
⇤ “histogram”“pooling”“blurring”
Why Pool?
⇤ “histogram”“pooling”“blurring”
MATLAB Example
• Example in MATLAB,
>> h = fspecial(‘gaussian’,[25,25],3); >> resp = imfilter(im, h);
⇤h resp
MATLAB Example
• Example in MATLAB,
>> h = fspecial(‘gaussian’,[25,25],3); >> resp = imfilter(im, h);
⇤h resp
Other Descriptors
• Since SIFT, a plethora of other descriptors have been proposed.
• SIFT is sometimes problematic to use in practice as it is • protected under existing patents.
• In OpenCV alone there are, • SURF - (patent protected) • BRIEF • PHREAK
• See following link for tutorial in OpenCV.
• ORB • BRISK • FAST
Other Descriptors
• Since SIFT, a plethora of other descriptors have been proposed.
• SIFT is sometimes problematic to use in practice as it is • protected under existing patents.
• In OpenCV alone there are, • SURF - (patent protected) • BRIEF • PHREAK
• See following link for tutorial in OpenCV.
• ORB • BRISK • FAST
Today
• Fundamental Matrix
• Interest Point Detectors
• Matching Regions
• RANSAC.
Robust Estimation
• Least squares criterion is not robust to outliers
• For example, the two outliers here cause the fitted line to be quite wrong.
• One approach to fitting under these circumstances is to use RANSAC – “Random sampling by consensus”
RANSAC
RANSAC
Fitting a homography with RANSAC
Original images Initial matches Inliers from RANSAC
Things to try in iOS - GPUImage
See - https://github.com/BradLarson/GPUImage for more details.
More to read…
• Prince et al. • Chapter 13, Sections 2-3.
• Corke et al. • Chapter 13, Section 3.
• E. Rublee et al. “ORB: an efficient alternative to SIFT or SURF”, ICCV 2011.