Download - ISM in NTU
-
7/29/2019 ISM in NTU
1/32
Scale
Space
Methods
in Feature
ExtractionISM ReportIsa Nasser - [2012]
Examiner - Prof Jiang Xudong
Matric no - G1003149F
-
7/29/2019 ISM in NTU
2/32
1
1. Introduction .................................................................................................................................... 2
1.1 Flow of this report ..................................................................................................................... 2
2. Types of Invariance ......................................................................................................................... 4
2.1 Illumination ............................................................................................................................... 4
2.2 Scale .......................................................................................................................................... 4
2.3 Rotation ..................................................................................................................................... 5
2.4 Affine ......................................................................................................................................... 5
2.5 Full Perspective ......................................................................................................................... 6
3. Gaussian Kernel Derivation ............................................................................................................. 7
3.1 Axiomatic Derivation of Gaussian Kernel .................................................................................. 7
3.2 Scale space from Causality ...................................................................................................... 10
3.3 Properties of the Gaussian Kernel .......................................................................................... 12
4. Differential Structure of Images ................................................................................................... 14
4.1 Motivation ............................................................................................................................... 14
4.2 Isophotes ................................................................................................................................. 14
4.3 Flowlines ................................................................................................................................. 14
4.4 Coordinate Systems and Transformation ............................................................................... 14
4.6 Image transformation and Special transformation ................................................................ 15
4.5 Invariance ................................................................................................................................ 15
4.6 First Order Gauge Coordinates ............................................................................................... 16
5. Feature Extraction Example: SURF and SIFT ................................................................................. 18
5.1 Properties of SIFT and SURF .................................................................................................... 18
5.2 Mechanics of SIFT and SURF ................................................................................................... 18
6. Performance of SURF and SIFT ..................................................................................................... 27
6.1 Rotation Invariance ................................................................................................................. 27
6.2 Illumination Invariance ........................................................................................................... 27
6.2 Noise Invariance ...................................................................................................................... 28
6.3 Scale Invariance....................................................................................................................... 28
6.4 Viewpoint Invariance .............................................................................................................. 29
6.5 Matching ................................................................................................................................. 29
7. Conclusion ..................................................................................................................................... 31
-
7/29/2019 ISM in NTU
3/32
2
1.INTRODUCTIONFeatures are a description of an image or part of an image which are invariant to one or many
transforms. These transforms must describe the changes we observe when taking two different
pictures of the same scene. The human brain is able to recognize objects that appear in both
pictures intuitively. This intuition is not easily derived mathematically since we cant program a
computer to do that same recognition without first studying this subject and modelling it.
For a computer to recognize the objects between the two pictures, it will have to identify certain
features which occur in the two pictures. The two pictures can be seen as 1 picture with a clone
which has some information removed, some information added as well as existing information going
through a transform.
What we want to derive is how to identify the existing information in picture 1 and transform it in a
way that will make it not change when going through the second transform when the second picture
is taken. Not all the information of the first picture has the required properties to make it notchange. Not all of the information with this property is captured in the second picture since the
camera might not physically capture it or it goes through some transform which its not invariant to.
1.1 Flow of this report
This report will be in a top down approach. In the first part, we will see the different type of
invariance that is useful for a feature to have. We will start with the understanding of why the
Gaussian Kernel is used to derive the scale space of an image. We then discuss on feature generating
functions and how we might approach discovering these functions by using gauge coordinates.
We will then run through the mechanics of two popular algorithms, SIFT and SURF for modern
techniques to achieve feature extraction using the image scale space. Finally we end by comparing
the performance of both the algorithms.
-
7/29/2019 ISM in NTU
4/32
3
Performance SURF and SIFT
Two Example Algorithms: SURF and SIFT
How to Generate Features
Why Gaussian Kernel ?
Types of Invariances
-
7/29/2019 ISM in NTU
5/32
4
2.TYPES OF INVARIANCEThis section will list out the desired invariances that are generally desirable for a descriptor. These
transformations occur in real life when capturing an image using a camera. Their effects can be seen
intuitively with the photographic examples.
2.1 Illumination
FIGURE 2.1: IMAGES WITH DIFFERENT ILLUMINATION
Illumination occurs when there are different light intensities in the images (figure 2.1) causing one to
be lighter than the other. It gets more difficult when there are multiple illuminants in the scene
which will cause the lighting at various angles to be different. This causes problems in identifying
features since the intensities of the pixels used as features will change solely due to the illumination
changes.
2.2 Scale
FIGURE 2.2: IMAGES WITH DIFFERENT SCALES. LEFT IMAGE IS ZOOMED OUT WHILE RIGHTIMAGE IS ZOOMED IN
The scaling transformation can be seen as a convolution at a more zoomed in image with a
Gaussian Kernel. The amount of scaling is defined by the standard deviation parameter of the
Gaussian kernel. This will be derived later. Also there are two scales to define in this image. The
-
7/29/2019 ISM in NTU
6/32
5
inner scale is the size of the smallest details that the image is describing relative to another image of
the same scene. That is, how zoomed in the image is when it is taken.
The second is the outer scale, which is how large an area the image is describing compared to
another image of the same scene. For example figure 2.2, the left image has a smaller inner scale but
larger outer scale as compared to the right image.
2.3 Rotation
FIGURE 2.3: IMAGES WITH A ROTATIONAL TRANSFORMATION
A rotation can be described with the following transformation matrix.
Having invariance to rotation is desirable since we may roll the camera when taking an image (figure
2.3).
Many features are invariant to rotation, one of which is the Laplacian which is defined as,
2 2
2 2
( , ) ( , ).( )
L x y L x yL
x y
2.4 Affine
FIGURE 2.4: IMAGES WITH AFFINE TRANSFORMATION
The affine transformation can be described with the following linear transformation,
-
7/29/2019 ISM in NTU
7/32
6
' 1
'
x a bx y e f
y c dad bc
We can intuitively visualize this transformation as skewing in either or both the horizontal and
vertical axes as in figure 2.4.
2.5 Full Perspective
When we view an image, our vision is not in the Cartesian coordinates. The parallel horizontal lines
on the floor does not align in our vision, they will incline towards each other and intersect each
other in the horizon. This is because the image undergoes a transformation from the 3 dimensional
Cartesian coordinates to the 4 dimensional homogeneous coordinates.
The transformation matrix that maps the Cartesian coordinate to the 2d image captured by the
camera requires information on the camera. Assuming the camera used for capturing the image canbe modelled as a pinhole camera model with focal length f, the camera matrix that transforms
physical 3 dimensional Cartesian coordinates (which we have good intuition in) to the 2D image
captured is,
y is in the 2 dimensional camera image coordinate while x is in the Cartesian coordinates of the 3
dimensional physical scene. Therefore when we move x1, x2 and x3 in two image captures we will
transform y1 and y2 of the two images. This distortion is the full perspective transformation which we
can see in figure 2.5.
FIGURE 2.5: IMAGES WITH DIFFERENT PERSPECTIVE DUE TO PROJECTION PLANE
-
7/29/2019 ISM in NTU
8/32
7
3.GAUSSIAN KERNEL DERIVATIONWhen we look at an image, the lens in our cornea will sample the image not only pixel by pixel but
through different scales. Our mind cant process every pixel to identify whats in our view. We haveevidence for this since after looking at an image, it is impossible for us to even decently reproduce
the whole image on a canvas with the inner scale of our choice.
It shows that our eyes look at an image and take pictures at different blurs or aperture size. This is
useful. The popular example is whether we would like to observe a tree or its leaves. When looking
at a panorama of trees, we obviously dont need the information of the tiny inner scale.
It happens that this blurring is through a Gaussian kernel. The choice of Gaussian kernel is purely due
to a few very intuitive constraints or assumptions we make of our view when not knowing what it is.
FIGURE 3.1: GENERATING SCALE SPACE BY CONVOLUTION WITH GAUSSIAN KERNEL
The variance of the Gaussian kernel is also called the scale of the resulting image. A feature which
can be extracted in all the scales are called scale invariant. The idea is that this feature will appear
even when we capture a photo of it at a distance; in a larger scale. Having this invariance is obviously
useful since what we perceive as a feature in our minds can be detected in all scales.
Why do we use the Gaussian kernel a scale space? This part explains in an intuitive manner why the
Gaussian kernel is the unique kernel for defining different scales of an image.
3.1 Axiomatic Derivation of Gaussian Kernel
We would like to derive the set of mathematical equations to describe the act of our eyes focusing
on an image in scale space. When we focus on scenery, we are looking at the scenery through our
eyes with a certain aperture width. This aperture width is how we focus and what gives the image
scale. This aperture width is in meters.
Theres also the spatial coordinate of the objects which is w, meters in Cartesian coordinates or
radians in frequency domain. Lastly, we need light to see these objects. L 0 is the luminance of this
-
7/29/2019 ISM in NTU
9/32
8
scenery while L is the luminance as we perceived it when it enters through the aperture. With those
definitions, we can analyse these units using dimensional analysis.
s w L0 L
Meter 1 -1 -2 -2
Candela 0 0 1 1
1 1 2 2
0 0 1 1A
Find the null space x:
0
0 0 1 1
1 1 0 0
T
A x
x
Using Buckinghams pi theorem,
0 0 1 1 1 1 0 0
0 0
0
( )
( )
L L G L L
LG
L
We have to determine the aperture function G. The strategy to do this is to subject the function toconstraints or axioms we know we have to assume when focusing using an aperture.
The function should not have any preference for any location and should be linear. This leads to
convolution where the function is swept across every location without any preference.
0
0 1 0 1
( , )( )
( , )
( , ) ( , ) ( , ) ( , ) ( , )
L x yG
L x y
L x y L x y G x y L u v G x u y v dudv
This motivates us to look at the variables in the frequency domain where the convolution operator is
replaced by a simple multiplication,
0 1( , ) ( , ) ( , )x y x y x yL L G
Next is the constraint of isotropy which loosely means the function has to be the same in all
directions. This means that we dont have to account for frequency of the horizontal and vertical
axis in the Cartesian coordinate. We have to only account for the distance.
2 2
x y
-
7/29/2019 ISM in NTU
10/32
9
0 1( ) ( ) ( )L L G
Next is the constraint of scale invariance. When we observe multiple times using the aperture, we
observe using a wider or narrower aperture but the aperture function doesnt change.
1 2 1 2( ) ( ) ( )G G G
The aperture is applied twice as two observations are made, in the frequency domain, this is a
multiplication. The resulting observation should be equivalent to observing using an aperture with a
wider diameter. The only general solution to this equation is the exponential function with a
multiplier . This multiplier is an unknown constant.
1 2 1 2exp( ) exp( ) exp( ( ))
However since are dimensionless parameters we derived earlier, we have to raise the power of
this parameter by another unknown constant p.
1 2 1 2exp(( ) ) exp(( ) ) exp( (( ) ( ) ))p p p p p
In short,
( ) exp(( ) )pG
To find the unknown constant, we apply the isotropy constraints again on this new equation. The
dimensions are independent and thus separable.
1 1 2 2 3 3( ) ( ) ( ) ... ( )
I Ie e e e
ie is the ith basis unit vector. Since the dimensions are in the spatial domain and we are using
Cartesian unit vectors as the bases, to find the length , we have to use Pythagoras from
projections along ie .
2 2 2
2 2 2 2 2 2
1 2 1 1 1 2 2 2
2 2 2 2 2 2 2 2
1 2 1 1 1 2 2 2
2 2 2 2 2 2 2 2 2
1 2 1 2 1 2
( ) ( ) ( )
1,( ) ( ) ( ) ( ) ( ) ( )
2,
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
x y z
x y z x y z
x y z x y z
x y z
p
p
We can immediately see that p=2 will cause the components to add and therefore is the correct
value for the constant. Now we find .
The next constraint is that when we open the aperture fully, we blur everything to the point theres
no information. Mathematically we can represent this as,
-
7/29/2019 ISM in NTU
11/32
10
2 2
lim ( ) 0
lim exp( ( ) )
G
This can be true if2 is negative.
Also, we know that2 has to be real. We shall make the assumption that
2 1
2 . It happens that
this cause the aperture function to be a solution to the diffusion equations. So far we have,
21( ) exp( ( ) )
2G
Now going back to the spatial domain, we shall perform an inverse Fourier transform,
2
2
1 1 2
exp21
( , ) ( ( )) (exp( ( ) ))2
x
g x F G F
Finally, the last constraint is that the aperture function is normalized such the integration of the
function across all spatial dimension is equals to 1. To do this, we integrate ( , )g x and divide it by
that value.
2
2exp
22
x
dx
The final result is therefore,
2
1 22
( , ) 1( , ) exp
22 2
g x xg x
This is the Gaussian kernel. It shows that the Gaussian function is the unique aperture function or
kernel given the constraints or axioms we deduce from our understanding of the process of visual
observation.
3.2 Scale space from Causality
This is from Koenderink paper The structures of images [Koenderink1984a]. The argument in this
paper is simple but rather useful in deriving the Gaussian kernel:
When we increase the scale and blur the image further, we have a situation that the final blurred
image is causedby the image we started from. That is, we cant introduce new type of information
or structures, as its called in the paper, into the image. A higher scale an image must always contain
less structure than a lower scale of the same image. It is not possible that new structures are formedsince we are just observing.
-
7/29/2019 ISM in NTU
12/32
11
This is illustrated clearly in the 1D case in figure 4.1,
FIGURE 3.2: LESS STRUCTURE MEANS THE PEAK WILL HAVE TO DECREASE WITHOUT NEW PEAKS FORMING
The arrows illustrate the direction of the 1-D signal array as the scale is increased. The signal should
get smoother and new structures such as a new peak should not form. The arrow is therefore the
only possible outcome for this condition to be satisfied.
Describing this mathematically,
2
20, 0
du d u
dt dx , decrease intensity for all u with negative curvature in the signal array (described
by arrow down in figure 4.1)
2
20, 0
du d u
dt dx
, increase intensity for all u with positive curvature in the signal array (described by
arrow up in the figure 4.1)
To summarize this,
2
20
du d u
dt dx
The next axiom we have to impose is that the rate of change of the intensity of a particular location
when we rescale the signal must be proportional to the curvature of that intensity spatially. This
implies the function that relatesdu
dtand
2
2
d u
dxis not only linear, they are proportional.
2
2
d u du
dx dt
We can adjust 1 since we can define the physical measure of the amount we scale or timeelapsed for a given numerical value.
2
2
d u du
dx dt
-
7/29/2019 ISM in NTU
13/32
12
This is the linear isotropic diffusion equation. The Green function of this equation is the Gaussian
kernel. It means that any function (for example an image) for which diffusion is applied, is convolved
with the Gaussian kernel. We can see this since the following relation is true,
Let2 2t , that is, the variance of the Gaussian kernel becomes time for the diffusion process. We
can first agree to this intuitively since as time increase the variance of the kernel used for blurring
should increase since the function gets more and more diffused.
2
22
1exp
44
x
tt
2 2 2
2
1 1exp exp
4 44 4
d x du x
dx t dt t t t
, this turns out to be true.
This can be extended to the 2 dimensional case which will not be elaborated in this report.
3.3 Properties of the Gaussian Kernel
Normalized Kernel Average Grey level Invariance
2
22
1exp
22
x
The constant2
1
2 in the Gaussian kernel is the normalization constant. It causes the integration
of the Gaussian kernel over all x to be unity since,
22
2exp 2
2
xdx
Since the area underneath the kernel is constant, increasing the width or the variance of the kernel
will both widen and shorten the kernel. That is, increasing the variance will decrease its amplitude.
Normalization ensures that the average grey levels of the image remains the same after blurring.
This property is known as average grey level invariance.
Self-Similarity Cascade Property
When we convolve two Gaussian functions with different variance, the resulting function is a
Gaussian with the variance of the addition of the two variances,
2 2 2 2
1 2 1 2
( ; ) ( ; ) ( ; )G x G x G x
-
7/29/2019 ISM in NTU
14/32
13
This is a useful property since convolution with a Gaussian kernel twice, will produce the same effect
of convolution with a wider Gaussian kernel with twice the variance.
The Scale Parameter
In order to avoid addition of squares when cascading kernels, substitute
22 t
To give us a neater N-dimensional Gaussian kernel,
2
1
1; exp
N
xG x t
tt
To give an even neater kernel we parameterize the X axis,
xx
t
211
; exp
N
G x t x
t
We call this particular re-parameterization of the coordinates, the natural coordinate. When
increasing or decreasingx
we are moving in steps of2
1
2 . This parameterization ofx removes
the variance or scale factor (aperture width) from the spatial coordinates.
If we limit the scale factor is to infinity, the blurred image will have homogenous intensity. It will
have average intensity of the image before blurring.
Relation to Generalized Functions
Taking the limit of the scale factor to zero will produce a Dirac delta function,
2
220
1lim exp ( )
22x
xx
The cumulative distrivution function for the Gaussian kernel is the error function,
2 2
22 2
1 1exp
2 22 2
x x xerf
-
7/29/2019 ISM in NTU
15/32
14
4.DIFFERENTIAL STRUCTURE OF IMAGES4.1 Motivation
Given a 2D image L, the transformation
22L L
x y
is a good edge detector.
2 22 2 2
2 22
L L L L L L L
y x x y x y x y
is a good corner detector. An obvious question would
be how these expressions are produced and whether theres a method to constraint the possible
differential expressions that can produce features on the image. This section will explore a strategy
to isolate a set of such expressions that describes useful structures on the image.
4.2 Isophotes
We can look at intensities of grayscale images as heights; the higher the intensity, the taller the
height and vice versa. The image can be divided into contiguous regions of the same height.
Isophotes describes the closed curves/lines which outlines these regions. All isophotes together
contains all the information in the image.
Properties of Isophotes
1. Isophotes are closed curves. Most isophotes in 2d are non-self-intersecting planar curvescalled Jordan curves.
2. Isophotes can intersect itself in critical case.3. Isophotes do not intersect other isophotes.4. Shape independent of grayscale transform such as changing brightness and contrast
A special class of isophotes occur when it goes through a singularity through a maximum, minimum
or saddlepoint. An isophote can only intersect itself when it goes through a saddlepoint.
4.3 Flowlines
Flowlines are lines everywhere perpendicular to the isophotes. Flowlines are dual of isophotes and
isophotes are dual of flowlines. One can be calculated from the other.
4.4 Coordinate Systems and Transformation
The set of derivatives taken at a particular location can be thought of as a language for us to define a
certain local feature. We can create expressions of derivatives of any order and combination. For
-
7/29/2019 ISM in NTU
16/32
15
example using Taylor series, we can take derivatives of increasing orders of a local point and multiply
it with a small offset to describe the structure at the slight offset.
4.6 Image transformation and Special transformation
An image transformation is described by a transformation matrix. For example, the rotational
transformation matrix is:
When an image is transformed, the volume is often changed and the density is distributed over
different areas. To calculate this change in volume we can calculate the Jacobian matrix.
'x is the transformed image coordinates
x is the original image coordinates
Looking at the change of an infinitesimally small volume:
We can see that the determinant of the Jacobian matrix is the factor for which the volume changed.
If the determinant is 1, we call the transformation a specialtransformation.
4.5 Invariance
When looking for features on an image, we would like it to be detected no matter if the image is
rotated or translated. Its quite obvious how this would be necessary since features dependent on
orthogonal orientation will be rare and not repeatable in detection for multiple practical
photographs with different tilts. If they need to be very specialto be detected, it cant be called a
feature. This is why we need the features to be invariant to rotation and translation.
-
7/29/2019 ISM in NTU
17/32
16
Example
is the gradient magnitude. Using Mathematica to simplify the equation:
The function doesnt change after rotation and thus rotational invariant.
4.6 First Order Gauge Coordinates
The usual global coordinates (X and Y Cartesian coordinates) would change its orientation relative to
the image orientation when any orthogonal transformation is applied. Creating features generating
functions using this coordinate will be difficult since the function would not be invariant to these
transformation most of the time.
If we can localize the coordinates such that it takes its position from properties of the location that
would be invariant under orthogonal transformations, we can then create functions using thiscoordinates which would in turn be invariant. This is localize coordinates are the gauge coordinates.
The maximal change of intensity is the gradient,
, ,L x y L x yw
x y
Now to find the orthogonal coordinate we rotate this vector by 90 degrees clockwise,
, , , ,0 11 0
L x y L x y L x y L x yvx y y x
Since the gauge coordinates is localized and can be derived from the intensity of the point, any
derivative expressed in gauge coordinates is an orthogonal invariant.
For example we differentiate the intensity matrix w.r.t. w. This will give us,
,L v ww
w
This will be the gradient at the point which is w itself. w is an invariant.
-
7/29/2019 ISM in NTU
18/32
17
Now we trydifferentiating the intensity matrix w.r.t. v.
,0
L v w
v
This is obvious since the gradient is pointing to w which is perpendicular to v. Zero is also an
invariant.
It is useful to express these invariants in Cartesian coordinates instead of gauge coordinates for us to
identify these points easily in a computer.
The family of features that can be derived from the partial derivatives of the first order gauge
coordinates can be displayed as such:
2 , , ,f Gauge D L x y k l
Here we start to see some of the famous invariant features:
TABLE 4-1: FEATURES WHICH ARE FUNCTIONS OF PARTIAL DERIVEATIVES OF FIRST ORDER GAUGE COORDINATES
Operator in Gauge Coordinates Description
vvL Ridge Detector
vv
w
L
L
Isophotes
2
vv wL L Corner Detection
3
vv
w
L
L
Affine invariant corner
detection
-
7/29/2019 ISM in NTU
19/32
18
5.FEATURE EXTRACTION EXAMPLE:SURF AND SIFT5.1 Properties of SIFT and SURF
Invariant to
1. Uniform Scaling2. OrientationPartially invariant to
Affine distortion Illumination
5.2 Mechanics of SIFT and SURF
SIFT and SURF can be separated into 3 processes:
a. Identifying keypoints (Step 1 to 3)b. Subtracting weak keypoints (Step 3)c. Attaching orientation of keypoint and packing it in a data structure (Step 4)
Step 1: Create Scale Space out of the image
SIFT
SURF
In SURF this is convolution is estimated and for computation efficiency. It involves first generating
the integral image which is just the cumulative integration of the image.
http://en.wikipedia.org/wiki/Scaling_%28geometry%29http://en.wikipedia.org/wiki/Orientation_%28geometry%29http://en.wikipedia.org/wiki/Affine_transformationhttp://en.wikipedia.org/wiki/Affine_transformationhttp://en.wikipedia.org/wiki/Orientation_%28geometry%29http://en.wikipedia.org/wiki/Scaling_%28geometry%29 -
7/29/2019 ISM in NTU
20/32
19
Followed by the convolution of this image with a box filter, which is defined as,
The computation of the convolution can be done in just 3 operations. This is assuming the integral
image is prepared.
FIGURE 5.1: INTEGRAL IMAGE WITH BOX FILTER CONVOLUTION FORMULA
The variance of the box filter with a square support of is
-
7/29/2019 ISM in NTU
21/32
20
Figure: Figure:
Figure: Figure:FIGURE 5.2:COMPARISON OF GAUSSIAN BLUR AND ITS ESTIMATE USING SURF
There are some visible vertical artefacts on the approximation, but it is quite close to the Gaussian
blur.
-
7/29/2019 ISM in NTU
22/32
21
Step 2: Compute the Second Order Derivative of the Scale Space
Scale-invariant features may be detected by using scale-normalized second order derivatives on the
scale space representation.
SIFT computes the scale space and aims to have its second order derivative which is its Laplacian.
SURFdoesnt compute the Scale Space of the Gaussian Kernel.
Instead it computes the scale space of estimate of the second order derivative Gaussian Kernel
directly.
SIFT: Laplacian
The Laplacian of the Gaussian is simply the second derivative of the images. The Laplacian of an
image identify the edges. It looks like the following:
2 , ,L x y t
Blurring the image smooth it out which makes the edge detection more stable and effective.
Computing the Laplacian is computationally intensive. SIFT tackle this by estimating it by using the
difference of Gaussians shown in figure 3.1.
FIGURE 5.3: DIFFERENCE OF GAUSSIAN APPROXIMATES THE LAPLACIAN OF GAUSSIANS
2 2 2 2
1 2 2 2
1 21 2
1 1, ; , , exp exp
2 22 2
x y x yDoG x y
It happens that this is a good approximation for the Laplacian of Gaussians.
-
7/29/2019 ISM in NTU
23/32
22
2
2 2 2 2
2 2
, , , , ,( , ; ) ( , ; ), , ( , ; ( , ;
1 1
1 1exp exp
2 22 2
1
G x y k G x y f x yt L x y k L x yL x y t L x y t t L L x y t t
t k k
x y x y
kk
k
This is therefore whats used in SIFT.
SURF: Determinant of Hessian
To find the Hessian matrix of the scale space of the image, we need to find the second order partial
derivatives. This is computationally intensive. As an approximation, SURF uses 2D Haar wavelets to
approximate this.
The convolution of these kernels can be computed very quickly.
-
7/29/2019 ISM in NTU
24/32
23
And
Where
1
2
3
4
[0, ] [0, ],
[ , 0] [0, ],
[0, ] [ , 0],
[ , 0] [ , 0]
l l
l l
l l
l l
The division with above is for normalization due to the variance of the Kernel.
r() is the ratio of the Frobenius norms(square root of the sum of the squares of each coefficients of
a matrix) of the approximated and exact Hessian of Gaussian kernels. This is of course precalculated.
Step 3: Feature Selection
SIFT: Find the minima/maxima of sub pixels in DoG
Sub pixels refer to interpolation between pixels. This is useful since the extrema often do not lie
exactly on a pixel. To obtain this, take the Taylor series at a point over 3 dimensions; the x and y aswell as the scale space .
-
7/29/2019 ISM in NTU
25/32
24
2
2
1
2
TTD D
D x D x x xx x
Where x is the offset in the 3 dimensions (x,y,) and D is the 3 dimensional scale space centred on
the candidate. Taking the first derivative of this and setting it to zero will give us the offset x which might be potential maxima. To check if the increase is significant, we check that
, ,
0.5
opt opt optx x y
D x
If its more than 0.5, use it for the next candidate point and find the sub pixel offset by interpolation
using Taylor series.
If the final maxima/minima is below 0.03 in magnitude, remove it as candidate keypoint. Otherwise
add is as candidate keypoint.
SIFT: Remove low contrast Edges
This step removes keypoints that are either of low intensity or lying on an edge.
Removing low intensity is simply comparing the intensity with designed threshold intensity. If its
lower than the threshold, remove the keypoint.
Keypoints near edges has a high value after DoG although they might not be robust to small noise.
To remedy this, we remove features that have high edge response but poorly determined locations.
Poorly defined peaks in the DoG have a low principal curvature along the edge compared to the
principal curvature across it. The principal curvature is proportional to the eigenvalue of the Hessian,
which is the second order partial derivative of the point. The ratio of the two eigenvalues will
determine of the feature should be kept.
1 2
( , ) ( , )
( , ) ( , )
,
/
L x y L x y
x x x yH
L x y L x y
y x y y
AH H
r
Instead of using the above computation, a less computationally intensive computation would be to
find:
21 ( ) / det( )r
Tr H H r
We would then compare this to a threshold rth which is equals to 10.
-
7/29/2019 ISM in NTU
26/32
25
1 10 11.1
10
th
th
r
r
If it is less than this threshold, the keypoint is removed.
SURF: Feature Selection
local maxima are selected by iterating through a 3x3x3 neighborhood of every point. This
exhaustively compares every point of the scale-space with its 26 nearest-neighbors.
For all local maxima, a threshold is applied to eliminate local maxima with low DoH value:
Step 5: Assign Keypoint Orientation
This step is similar to the gauge coordinates concept introduced later. To make the feature rotation
invariant, the gradient of the feature can be used to identify the rotational orientation of the feature
in different images. The magnitude of the gradient of a point, and subsequently its angle can be
calculated using the following formulas.
The gradient angle of every pixel in an area of the size of the Gaussian kernel around the feature is
calculated. The angle of every pixel is collected in one of the 36 bins. Each bin covers a range of 10
degrees of the full 360 degrees. The magnitude of the gradient m is added to the bin.
The largest bin value will be assigned in index form (1 36) to the keypoint. Also, any peak that is
80% or more of the highest peak is assigned to the keypoint to create new keypoints.
Step 6: Generate SIFT Features
16X16 window around the keypoint which is broken down to 16 4x4 parts (figure 3.2).
-
7/29/2019 ISM in NTU
27/32
26
FIGURE 5.4: GENERATING KEYPOINTS BASED ON GRADIENTS OF 4X4 WINDOWS
For every 4X4 window, the angle of the gradient and its magnitude is calculated and collected in 8
bins of 45 degrees (figure 3.3) each in the similar manner as before. This time however, the gradient
magnitude is weighted using a Gaussian weighting function where the further gradient contributes
less while the gradient nearer to the features contributes more. Do this for all the 4x4 parts and we
will have 4x4x8 numbers to describe the orientation of the keypoint.
FIGURE 5.5: STORING GRADIENTS IN 8 BIN 45 DEGREES APART
-
7/29/2019 ISM in NTU
28/32
27
6.PERFORMANCE OF SURF AND SIFTThis is from the work of N Khan, B McCane, G Wyvill. It will be useful to see how robust the features
generated by SIFT and SURF and the invariances they possess.
6.1 Rotation Invariance
1) The first 500 object images from the dataset are picked and used for testing. Rotated atdifferent angles in a clockwise direction to generate 500 trained images.
2) imrotate function of MATLAB with bilinear interpolation is used for performing rotations.
All the algorithms are robust.
6.2 Illumination Invariance
500 test images are brightened by adding or subtracting a number of pixel value offsets (i.e. every
pixels red, green and blue channel intensities are incremented equally).
The sum is clamped to be within 0-255. The offsets tested are: 50, 70, 100, 120,30,50,70,90.
-
7/29/2019 ISM in NTU
29/32
28
32 and 64 word length SIFT does not work well on a heavily brightened image. SURF and SIFT 96 and
128 is still robust
6.2 Noise Invariance
imnoise function of MATLAB and Gaussian, salt and pepper and speckle to generate the trained data.
1. Gaussian 2 = 0.1 and 2 = 0.22. Salt and pepper noise density of 15% and 35%3. Multiplicative white noise with mean 0 and 2 = 0.04
Interestingly, SIFT(32) does not work well with salt and pepper noise. SURF is still robust as well as
longer word length SIFT.
6.3 Scale Invariance
500 objects from the dataset for which there were two images available at different scales (i.e. a
total of 1000 images).
-
7/29/2019 ISM in NTU
30/32
29
Interestingly, SURF does not work well here. This is probably due to the approximation of the
Gaussian kernel with the box kernels. SIFT(32) and above does well here. However its also
computationally much slower than SURF.
6.4 Viewpoint Invariance
500 objects from the dataset for which there were two images available in the dataset taken from
different viewpoints i.e. a total of 1000 images.
The first object image is used for training while the other is used for testing.
SIFT outperforms SURF here but not at a large margin.
6.5 Matching
-
7/29/2019 ISM in NTU
31/32
30
SURF takes the longest time for matching. This is surprising since SURF a faster algorithm. This is
explained since SURF also produces more features. Matching these features therefore takes more
time.
Using shorter descriptor takes shorter time for SIFT.
Average matching time required to match one query image by all descriptors
-
7/29/2019 ISM in NTU
32/32
7.CONCLUSIONWe have started by looking at the different type of invariance that is useful for features to have. We
then study on why the Gaussian kernel is the unique kernel for deriving a scale space representation
given some logical assumptions.
We also identify a whole class of operators in first order gauge coordinates that are orthogonal
invariant and emphasizes on useful features such as corners and ridges.
With the idea of scale space and operator with certain invariances, we then go through the
mechanics of the SIFT and SURF algorithm to identify the computation used to achieve the
keypoints.
Lastly, we look at the performance of SIFT and SURF generated features and find that they are
indeed invariant to typical real world transformations discussed in the first chapter.