ism in ntu

7/29/2019 ISM in NTU

1/32

Scale

Space

Methods

in Feature

ExtractionISM ReportIsa Nasser - [2012]

Examiner - Prof Jiang Xudong

Matric no - G1003149F


2/32

1

1. Introduction .................................................................................................................................... 2

1.1 Flow of this report ..................................................................................................................... 2

2. Types of Invariance ......................................................................................................................... 4

2.1 Illumination ............................................................................................................................... 4

2.2 Scale .......................................................................................................................................... 4

2.3 Rotation ..................................................................................................................................... 5

2.4 Affine ......................................................................................................................................... 5

2.5 Full Perspective ......................................................................................................................... 6

3. Gaussian Kernel Derivation ............................................................................................................. 7

3.1 Axiomatic Derivation of Gaussian Kernel .................................................................................. 7

3.2 Scale space from Causality ...................................................................................................... 10

3.3 Properties of the Gaussian Kernel .......................................................................................... 12

4. Differential Structure of Images ................................................................................................... 14

4.1 Motivation ............................................................................................................................... 14

4.2 Isophotes ................................................................................................................................. 14

4.3 Flowlines ................................................................................................................................. 14

4.4 Coordinate Systems and Transformation ............................................................................... 14

4.6 Image transformation and Special transformation ................................................................ 15

4.5 Invariance ................................................................................................................................ 15

4.6 First Order Gauge Coordinates ............................................................................................... 16

5. Feature Extraction Example: SURF and SIFT ................................................................................. 18

5.1 Properties of SIFT and SURF .................................................................................................... 18

5.2 Mechanics of SIFT and SURF ................................................................................................... 18

6. Performance of SURF and SIFT ..................................................................................................... 27

6.1 Rotation Invariance ................................................................................................................. 27

6.2 Illumination Invariance ........................................................................................................... 27

6.2 Noise Invariance ...................................................................................................................... 28

6.3 Scale Invariance....................................................................................................................... 28

6.4 Viewpoint Invariance .............................................................................................................. 29

6.5 Matching ................................................................................................................................. 29

7. Conclusion ..................................................................................................................................... 31


3/32

2

1.INTRODUCTIONFeatures are a description of an image or part of an image which are invariant to one or many

transforms. These transforms must describe the changes we observe when taking two different

pictures of the same scene. The human brain is able to recognize objects that appear in both

pictures intuitively. This intuition is not easily derived mathematically since we cant program a

computer to do that same recognition without first studying this subject and modelling it.

For a computer to recognize the objects between the two pictures, it will have to identify certain

features which occur in the two pictures. The two pictures can be seen as 1 picture with a clone

which has some information removed, some information added as well as existing information going

through a transform.

What we want to derive is how to identify the existing information in picture 1 and transform it in a

way that will make it not change when going through the second transform when the second picture

is taken. Not all the information of the first picture has the required properties to make it notchange. Not all of the information with this property is captured in the second picture since the

camera might not physically capture it or it goes through some transform which its not invariant to.

1.1 Flow of this report

This report will be in a top down approach. In the first part, we will see the different type of

invariance that is useful for a feature to have. We will start with the understanding of why the

Gaussian Kernel is used to derive the scale space of an image. We then discuss on feature generating

functions and how we might approach discovering these functions by using gauge coordinates.

We will then run through the mechanics of two popular algorithms, SIFT and SURF for modern

techniques to achieve feature extraction using the image scale space. Finally we end by comparing

the performance of both the algorithms.


4/32

3

Performance SURF and SIFT

Two Example Algorithms: SURF and SIFT

How to Generate Features

Why Gaussian Kernel ?

Types of Invariances


5/32

4

2.TYPES OF INVARIANCEThis section will list out the desired invariances that are generally desirable for a descriptor. These

transformations occur in real life when capturing an image using a camera. Their effects can be seen

intuitively with the photographic examples.

2.1 Illumination

FIGURE 2.1: IMAGES WITH DIFFERENT ILLUMINATION

Illumination occurs when there are different light intensities in the images (figure 2.1) causing one to

be lighter than the other. It gets more difficult when there are multiple illuminants in the scene

which will cause the lighting at various angles to be different. This causes problems in identifying

features since the intensities of the pixels used as features will change solely due to the illumination

changes.

2.2 Scale

FIGURE 2.2: IMAGES WITH DIFFERENT SCALES. LEFT IMAGE IS ZOOMED OUT WHILE RIGHTIMAGE IS ZOOMED IN

The scaling transformation can be seen as a convolution at a more zoomed in image with a

Gaussian Kernel. The amount of scaling is defined by the standard deviation parameter of the

Gaussian kernel. This will be derived later. Also there are two scales to define in this image. The


6/32

5

inner scale is the size of the smallest details that the image is describing relative to another image of

the same scene. That is, how zoomed in the image is when it is taken.

The second is the outer scale, which is how large an area the image is describing compared to

another image of the same scene. For example figure 2.2, the left image has a smaller inner scale but

larger outer scale as compared to the right image.

2.3 Rotation

FIGURE 2.3: IMAGES WITH A ROTATIONAL TRANSFORMATION

A rotation can be described with the following transformation matrix.

Having invariance to rotation is desirable since we may roll the camera when taking an image (figure

2.3).

Many features are invariant to rotation, one of which is the Laplacian which is defined as,

2 2

2 2

( , ) ( , ).( )

L x y L x yL

x y

2.4 Affine

FIGURE 2.4: IMAGES WITH AFFINE TRANSFORMATION

The affine transformation can be described with the following linear transformation,


7/32

6

' 1

'

x a bx y e f

y c dad bc

We can intuitively visualize this transformation as skewing in either or both the horizontal and

vertical axes as in figure 2.4.

2.5 Full Perspective

When we view an image, our vision is not in the Cartesian coordinates. The parallel horizontal lines

on the floor does not align in our vision, they will incline towards each other and intersect each

other in the horizon. This is because the image undergoes a transformation from the 3 dimensional

Cartesian coordinates to the 4 dimensional homogeneous coordinates.

The transformation matrix that maps the Cartesian coordinate to the 2d image captured by the

camera requires information on the camera. Assuming the camera used for capturing the image canbe modelled as a pinhole camera model with focal length f, the camera matrix that transforms

physical 3 dimensional Cartesian coordinates (which we have good intuition in) to the 2D image

captured is,

y is in the 2 dimensional camera image coordinate while x is in the Cartesian coordinates of the 3

dimensional physical scene. Therefore when we move x1, x2 and x3 in two image captures we will

transform y1 and y2 of the two images. This distortion is the full perspective transformation which we

can see in figure 2.5.

FIGURE 2.5: IMAGES WITH DIFFERENT PERSPECTIVE DUE TO PROJECTION PLANE


8/32

7

3.GAUSSIAN KERNEL DERIVATIONWhen we look at an image, the lens in our cornea will sample the image not only pixel by pixel but

through different scales. Our mind cant process every pixel to identify whats in our view. We haveevidence for this since after looking at an image, it is impossible for us to even decently reproduce

the whole image on a canvas with the inner scale of our choice.

It shows that our eyes look at an image and take pictures at different blurs or aperture size. This is

useful. The popular example is whether we would like to observe a tree or its leaves. When looking

at a panorama of trees, we obviously dont need the information of the tiny inner scale.

It happens that this blurring is through a Gaussian kernel. The choice of Gaussian kernel is purely due

to a few very intuitive constraints or assumptions we make of our view when not knowing what it is.

FIGURE 3.1: GENERATING SCALE SPACE BY CONVOLUTION WITH GAUSSIAN KERNEL

The variance of the Gaussian kernel is also called the scale of the resulting image. A feature which

can be extracted in all the scales are called scale invariant. The idea is that this feature will appear

even when we capture a photo of it at a distance; in a larger scale. Having this invariance is obviously

useful since what we perceive as a feature in our minds can be detected in all scales.

Why do we use the Gaussian kernel a scale space? This part explains in an intuitive manner why the

Gaussian kernel is the unique kernel for defining different scales of an image.

3.1 Axiomatic Derivation of Gaussian Kernel

We would like to derive the set of mathematical equations to describe the act of our eyes focusing

on an image in scale space. When we focus on scenery, we are looking at the scenery through our

eyes with a certain aperture width. This aperture width is how we focus and what gives the image

scale. This aperture width is in meters.

Theres also the spatial coordinate of the objects which is w, meters in Cartesian coordinates or

radians in frequency domain. Lastly, we need light to see these objects. L 0 is the luminance of this


9/32

8

scenery while L is the luminance as we perceived it when it enters through the aperture. With those

definitions, we can analyse these units using dimensional analysis.

s w L0 L

Meter 1 -1 -2 -2

Candela 0 0 1 1

1 1 2 2

0 0 1 1A

Find the null space x:

0

0 0 1 1

1 1 0 0

T

A x

x

Using Buckinghams pi theorem,

0 0 1 1 1 1 0 0

0 0

0

( )

( )

L L G L L

LG

L

We have to determine the aperture function G. The strategy to do this is to subject the function toconstraints or axioms we know we have to assume when focusing using an aperture.

The function should not have any preference for any location and should be linear. This leads to

convolution where the function is swept across every location without any preference.

0

0 1 0 1

( , )( )

( , )

( , ) ( , ) ( , ) ( , ) ( , )

L x yG

L x y

L x y L x y G x y L u v G x u y v dudv

This motivates us to look at the variables in the frequency domain where the convolution operator is

replaced by a simple multiplication,

0 1( , ) ( , ) ( , )x y x y x yL L G

Next is the constraint of isotropy which loosely means the function has to be the same in all

directions. This means that we dont have to account for frequency of the horizontal and vertical

axis in the Cartesian coordinate. We have to only account for the distance.

2 2

x y


10/32

9

0 1( ) ( ) ( )L L G

Next is the constraint of scale invariance. When we observe multiple times using the aperture, we

observe using a wider or narrower aperture but the aperture function doesnt change.

1 2 1 2( ) ( ) ( )G G G

The aperture is applied twice as two observations are made, in the frequency domain, this is a

multiplication. The resulting observation should be equivalent to observing using an aperture with a

wider diameter. The only general solution to this equation is the exponential function with a

multiplier . This multiplier is an unknown constant.

1 2 1 2exp( ) exp( ) exp( ( ))

However since are dimensionless parameters we derived earlier, we have to raise the power of

this parameter by another unknown constant p.

1 2 1 2exp(( ) ) exp(( ) ) exp( (( ) ( ) ))p p p p p

In short,

( ) exp(( ) )pG

To find the unknown constant, we apply the isotropy constraints again on this new equation. The

dimensions are independent and thus separable.

1 1 2 2 3 3( ) ( ) ( ) ... ( )

I Ie e e e

ie is the ith basis unit vector. Since the dimensions are in the spatial domain and we are using

Cartesian unit vectors as the bases, to find the length , we have to use Pythagoras from

projections along ie .

2 2 2

2 2 2 2 2 2

1 2 1 1 1 2 2 2

2 2 2 2 2 2 2 2

1 2 1 1 1 2 2 2

2 2 2 2 2 2 2 2 2

1 2 1 2 1 2

( ) ( ) ( )

1,( ) ( ) ( ) ( ) ( ) ( )

2,

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

x y z

x y z x y z

x y z x y z

x y z

p

p

We can immediately see that p=2 will cause the components to add and therefore is the correct

value for the constant. Now we find .

The next constraint is that when we open the aperture fully, we blur everything to the point theres

no information. Mathematically we can represent this as,


11/32

10

2 2

lim ( ) 0

lim exp( ( ) )

G

This can be true if2 is negative.

Also, we know that2 has to be real. We shall make the assumption that

2 1

2 . It happens that

this cause the aperture function to be a solution to the diffusion equations. So far we have,

21( ) exp( ( ) )

2G

Now going back to the spatial domain, we shall perform an inverse Fourier transform,

2

2

1 1 2

exp21

( , ) ( ( )) (exp( ( ) ))2

x

g x F G F

Finally, the last constraint is that the aperture function is normalized such the integration of the

function across all spatial dimension is equals to 1. To do this, we integrate ( , )g x and divide it by

that value.

2

2exp

22

x

dx

The final result is therefore,

2

1 22

( , ) 1( , ) exp

22 2

g x xg x

This is the Gaussian kernel. It shows that the Gaussian function is the unique aperture function or

kernel given the constraints or axioms we deduce from our understanding of the process of visual

observation.

3.2 Scale space from Causality

This is from Koenderink paper The structures of images [Koenderink1984a]. The argument in this

paper is simple but rather useful in deriving the Gaussian kernel:

When we increase the scale and blur the image further, we have a situation that the final blurred

image is causedby the image we started from. That is, we cant introduce new type of information

or structures, as its called in the paper, into the image. A higher scale an image must always contain

less structure than a lower scale of the same image. It is not possible that new structures are formedsince we are just observing.


12/32

11

This is illustrated clearly in the 1D case in figure 4.1,

FIGURE 3.2: LESS STRUCTURE MEANS THE PEAK WILL HAVE TO DECREASE WITHOUT NEW PEAKS FORMING

The arrows illustrate the direction of the 1-D signal array as the scale is increased. The signal should

get smoother and new structures such as a new peak should not form. The arrow is therefore the

only possible outcome for this condition to be satisfied.

Describing this mathematically,

2

20, 0

du d u

dt dx , decrease intensity for all u with negative curvature in the signal array (described

by arrow down in figure 4.1)

2

20, 0

du d u

dt dx

, increase intensity for all u with positive curvature in the signal array (described by

arrow up in the figure 4.1)

To summarize this,

2

20

du d u

dt dx

The next axiom we have to impose is that the rate of change of the intensity of a particular location

when we rescale the signal must be proportional to the curvature of that intensity spatially. This

implies the function that relatesdu

dtand

2

2

d u

dxis not only linear, they are proportional.

2

2

d u du

dx dt

We can adjust 1 since we can define the physical measure of the amount we scale or timeelapsed for a given numerical value.

2

2

d u du

dx dt


13/32

12

This is the linear isotropic diffusion equation. The Green function of this equation is the Gaussian

kernel. It means that any function (for example an image) for which diffusion is applied, is convolved

with the Gaussian kernel. We can see this since the following relation is true,

Let2 2t , that is, the variance of the Gaussian kernel becomes time for the diffusion process. We

can first agree to this intuitively since as time increase the variance of the kernel used for blurring

should increase since the function gets more and more diffused.

2

22

1exp

44

x

tt

2 2 2

2

1 1exp exp

4 44 4

d x du x

dx t dt t t t

, this turns out to be true.

This can be extended to the 2 dimensional case which will not be elaborated in this report.

3.3 Properties of the Gaussian Kernel

Normalized Kernel Average Grey level Invariance

2

22

1exp

22

x

The constant2

1

2 in the Gaussian kernel is the normalization constant. It causes the integration

of the Gaussian kernel over all x to be unity since,

22

2exp 2

2

xdx

Since the area underneath the kernel is constant, increasing the width or the variance of the kernel

will both widen and shorten the kernel. That is, increasing the variance will decrease its amplitude.

Normalization ensures that the average grey levels of the image remains the same after blurring.

This property is known as average grey level invariance.

Self-Similarity Cascade Property

When we convolve two Gaussian functions with different variance, the resulting function is a

Gaussian with the variance of the addition of the two variances,

2 2 2 2

1 2 1 2

( ; ) ( ; ) ( ; )G x G x G x


14/32

13

This is a useful property since convolution with a Gaussian kernel twice, will produce the same effect

of convolution with a wider Gaussian kernel with twice the variance.

The Scale Parameter

In order to avoid addition of squares when cascading kernels, substitute

22 t

To give us a neater N-dimensional Gaussian kernel,

2

1

1; exp

N

xG x t

tt

To give an even neater kernel we parameterize the X axis,

xx

t

211

; exp

N

G x t x

t

We call this particular re-parameterization of the coordinates, the natural coordinate. When

increasing or decreasingx

we are moving in steps of2

1

2 . This parameterization ofx removes

the variance or scale factor (aperture width) from the spatial coordinates.

If we limit the scale factor is to infinity, the blurred image will have homogenous intensity. It will

have average intensity of the image before blurring.

Relation to Generalized Functions

Taking the limit of the scale factor to zero will produce a Dirac delta function,

2

220

1lim exp ( )

22x

xx

The cumulative distrivution function for the Gaussian kernel is the error function,

2 2

22 2

1 1exp

2 22 2

x x xerf


15/32

14

4.DIFFERENTIAL STRUCTURE OF IMAGES4.1 Motivation

Given a 2D image L, the transformation

22L L

x y

is a good edge detector.

2 22 2 2

2 22

L L L L L L L

y x x y x y x y

is a good corner detector. An obvious question would

be how these expressions are produced and whether theres a method to constraint the possible

differential expressions that can produce features on the image. This section will explore a strategy

to isolate a set of such expressions that describes useful structures on the image.

4.2 Isophotes

We can look at intensities of grayscale images as heights; the higher the intensity, the taller the

height and vice versa. The image can be divided into contiguous regions of the same height.

Isophotes describes the closed curves/lines which outlines these regions. All isophotes together

contains all the information in the image.

Properties of Isophotes

1. Isophotes are closed curves. Most isophotes in 2d are non-self-intersecting planar curvescalled Jordan curves.

2. Isophotes can intersect itself in critical case.3. Isophotes do not intersect other isophotes.4. Shape independent of grayscale transform such as changing brightness and contrast

A special class of isophotes occur when it goes through a singularity through a maximum, minimum

or saddlepoint. An isophote can only intersect itself when it goes through a saddlepoint.

4.3 Flowlines

Flowlines are lines everywhere perpendicular to the isophotes. Flowlines are dual of isophotes and

isophotes are dual of flowlines. One can be calculated from the other.

4.4 Coordinate Systems and Transformation

The set of derivatives taken at a particular location can be thought of as a language for us to define a

certain local feature. We can create expressions of derivatives of any order and combination. For


16/32

15

example using Taylor series, we can take derivatives of increasing orders of a local point and multiply

it with a small offset to describe the structure at the slight offset.

4.6 Image transformation and Special transformation

An image transformation is described by a transformation matrix. For example, the rotational

transformation matrix is:

When an image is transformed, the volume is often changed and the density is distributed over

different areas. To calculate this change in volume we can calculate the Jacobian matrix.

'x is the transformed image coordinates

x is the original image coordinates

Looking at the change of an infinitesimally small volume:

We can see that the determinant of the Jacobian matrix is the factor for which the volume changed.

If the determinant is 1, we call the transformation a specialtransformation.

4.5 Invariance

When looking for features on an image, we would like it to be detected no matter if the image is

rotated or translated. Its quite obvious how this would be necessary since features dependent on

orthogonal orientation will be rare and not repeatable in detection for multiple practical

photographs with different tilts. If they need to be very specialto be detected, it cant be called a

feature. This is why we need the features to be invariant to rotation and translation.


17/32

16

Example

is the gradient magnitude. Using Mathematica to simplify the equation:

The function doesnt change after rotation and thus rotational invariant.

4.6 First Order Gauge Coordinates

The usual global coordinates (X and Y Cartesian coordinates) would change its orientation relative to

the image orientation when any orthogonal transformation is applied. Creating features generating

functions using this coordinate will be difficult since the function would not be invariant to these

transformation most of the time.

If we can localize the coordinates such that it takes its position from properties of the location that

would be invariant under orthogonal transformations, we can then create functions using thiscoordinates which would in turn be invariant. This is localize coordinates are the gauge coordinates.

The maximal change of intensity is the gradient,

, ,L x y L x yw

x y

Now to find the orthogonal coordinate we rotate this vector by 90 degrees clockwise,

, , , ,0 11 0

L x y L x y L x y L x yvx y y x

Since the gauge coordinates is localized and can be derived from the intensity of the point, any

derivative expressed in gauge coordinates is an orthogonal invariant.

For example we differentiate the intensity matrix w.r.t. w. This will give us,

,L v ww

w

This will be the gradient at the point which is w itself. w is an invariant.


18/32

17

Now we trydifferentiating the intensity matrix w.r.t. v.

,0

L v w

v

This is obvious since the gradient is pointing to w which is perpendicular to v. Zero is also an

invariant.

It is useful to express these invariants in Cartesian coordinates instead of gauge coordinates for us to

identify these points easily in a computer.

The family of features that can be derived from the partial derivatives of the first order gauge

coordinates can be displayed as such:

2 , , ,f Gauge D L x y k l

Here we start to see some of the famous invariant features:

TABLE 4-1: FEATURES WHICH ARE FUNCTIONS OF PARTIAL DERIVEATIVES OF FIRST ORDER GAUGE COORDINATES

Operator in Gauge Coordinates Description

vvL Ridge Detector

vv

w

L

L

Isophotes

2

vv wL L Corner Detection

3

vv

w

L

L

Affine invariant corner

detection


19/32

18

5.FEATURE EXTRACTION EXAMPLE:SURF AND SIFT5.1 Properties of SIFT and SURF

Invariant to

1. Uniform Scaling2. OrientationPartially invariant to

Affine distortion Illumination

5.2 Mechanics of SIFT and SURF

SIFT and SURF can be separated into 3 processes:

a. Identifying keypoints (Step 1 to 3)b. Subtracting weak keypoints (Step 3)c. Attaching orientation of keypoint and packing it in a data structure (Step 4)

Step 1: Create Scale Space out of the image

SIFT

SURF

In SURF this is convolution is estimated and for computation efficiency. It involves first generating

the integral image which is just the cumulative integration of the image.
http://en.wikipedia.org/wiki/Scaling_%28geometry%29http://en.wikipedia.org/wiki/Orientation_%28geometry%29http://en.wikipedia.org/wiki/Affine_transformationhttp://en.wikipedia.org/wiki/Affine_transformationhttp://en.wikipedia.org/wiki/Orientation_%28geometry%29http://en.wikipedia.org/wiki/Scaling_%28geometry%29


20/32

19

Followed by the convolution of this image with a box filter, which is defined as,

The computation of the convolution can be done in just 3 operations. This is assuming the integral

image is prepared.

FIGURE 5.1: INTEGRAL IMAGE WITH BOX FILTER CONVOLUTION FORMULA

The variance of the box filter with a square support of is


21/32

20

Figure: Figure:

Figure: Figure:FIGURE 5.2:COMPARISON OF GAUSSIAN BLUR AND ITS ESTIMATE USING SURF

There are some visible vertical artefacts on the approximation, but it is quite close to the Gaussian

blur.


22/32

21

Step 2: Compute the Second Order Derivative of the Scale Space

Scale-invariant features may be detected by using scale-normalized second order derivatives on the

scale space representation.

SIFT computes the scale space and aims to have its second order derivative which is its Laplacian.

SURFdoesnt compute the Scale Space of the Gaussian Kernel.

Instead it computes the scale space of estimate of the second order derivative Gaussian Kernel

directly.

SIFT: Laplacian

The Laplacian of the Gaussian is simply the second derivative of the images. The Laplacian of an

image identify the edges. It looks like the following:

2 , ,L x y t

Blurring the image smooth it out which makes the edge detection more stable and effective.

Computing the Laplacian is computationally intensive. SIFT tackle this by estimating it by using the

difference of Gaussians shown in figure 3.1.

FIGURE 5.3: DIFFERENCE OF GAUSSIAN APPROXIMATES THE LAPLACIAN OF GAUSSIANS

2 2 2 2

1 2 2 2

1 21 2

1 1, ; , , exp exp

2 22 2

x y x yDoG x y

It happens that this is a good approximation for the Laplacian of Gaussians.


23/32

22

2

2 2 2 2

2 2

, , , , ,( , ; ) ( , ; ), , ( , ; ( , ;

1 1

1 1exp exp

2 22 2

1

G x y k G x y f x yt L x y k L x yL x y t L x y t t L L x y t t

t k k

x y x y

kk

k

This is therefore whats used in SIFT.

SURF: Determinant of Hessian

To find the Hessian matrix of the scale space of the image, we need to find the second order partial

derivatives. This is computationally intensive. As an approximation, SURF uses 2D Haar wavelets to

approximate this.

The convolution of these kernels can be computed very quickly.


24/32

23

And

Where

1

2

3

4

[0, ] [0, ],

[ , 0] [0, ],

[0, ] [ , 0],

[ , 0] [ , 0]

l l

l l

l l

l l

The division with above is for normalization due to the variance of the Kernel.

r() is the ratio of the Frobenius norms(square root of the sum of the squares of each coefficients of

a matrix) of the approximated and exact Hessian of Gaussian kernels. This is of course precalculated.

Step 3: Feature Selection

SIFT: Find the minima/maxima of sub pixels in DoG

Sub pixels refer to interpolation between pixels. This is useful since the extrema often do not lie

exactly on a pixel. To obtain this, take the Taylor series at a point over 3 dimensions; the x and y aswell as the scale space .


25/32

24

2

2

1

2

TTD D

D x D x x xx x

Where x is the offset in the 3 dimensions (x,y,) and D is the 3 dimensional scale space centred on

the candidate. Taking the first derivative of this and setting it to zero will give us the offset x which might be potential maxima. To check if the increase is significant, we check that

, ,

0.5

opt opt optx x y

D x

If its more than 0.5, use it for the next candidate point and find the sub pixel offset by interpolation

using Taylor series.

If the final maxima/minima is below 0.03 in magnitude, remove it as candidate keypoint. Otherwise

add is as candidate keypoint.

SIFT: Remove low contrast Edges

This step removes keypoints that are either of low intensity or lying on an edge.

Removing low intensity is simply comparing the intensity with designed threshold intensity. If its

lower than the threshold, remove the keypoint.

Keypoints near edges has a high value after DoG although they might not be robust to small noise.

To remedy this, we remove features that have high edge response but poorly determined locations.

Poorly defined peaks in the DoG have a low principal curvature along the edge compared to the

principal curvature across it. The principal curvature is proportional to the eigenvalue of the Hessian,

which is the second order partial derivative of the point. The ratio of the two eigenvalues will

determine of the feature should be kept.

1 2

( , ) ( , )

( , ) ( , )

,

/

L x y L x y

x x x yH

L x y L x y

y x y y

AH H

r

Instead of using the above computation, a less computationally intensive computation would be to

find:

21 ( ) / det( )r

Tr H H r

We would then compare this to a threshold rth which is equals to 10.


26/32

25

1 10 11.1

10

th

th

r

r

If it is less than this threshold, the keypoint is removed.

SURF: Feature Selection

local maxima are selected by iterating through a 3x3x3 neighborhood of every point. This

exhaustively compares every point of the scale-space with its 26 nearest-neighbors.

For all local maxima, a threshold is applied to eliminate local maxima with low DoH value:

Step 5: Assign Keypoint Orientation

This step is similar to the gauge coordinates concept introduced later. To make the feature rotation

invariant, the gradient of the feature can be used to identify the rotational orientation of the feature

in different images. The magnitude of the gradient of a point, and subsequently its angle can be

calculated using the following formulas.

The gradient angle of every pixel in an area of the size of the Gaussian kernel around the feature is

calculated. The angle of every pixel is collected in one of the 36 bins. Each bin covers a range of 10

degrees of the full 360 degrees. The magnitude of the gradient m is added to the bin.

The largest bin value will be assigned in index form (1 36) to the keypoint. Also, any peak that is

80% or more of the highest peak is assigned to the keypoint to create new keypoints.

Step 6: Generate SIFT Features

16X16 window around the keypoint which is broken down to 16 4x4 parts (figure 3.2).


27/32

26

FIGURE 5.4: GENERATING KEYPOINTS BASED ON GRADIENTS OF 4X4 WINDOWS

For every 4X4 window, the angle of the gradient and its magnitude is calculated and collected in 8

bins of 45 degrees (figure 3.3) each in the similar manner as before. This time however, the gradient

magnitude is weighted using a Gaussian weighting function where the further gradient contributes

less while the gradient nearer to the features contributes more. Do this for all the 4x4 parts and we

will have 4x4x8 numbers to describe the orientation of the keypoint.

FIGURE 5.5: STORING GRADIENTS IN 8 BIN 45 DEGREES APART


28/32

27

6.PERFORMANCE OF SURF AND SIFTThis is from the work of N Khan, B McCane, G Wyvill. It will be useful to see how robust the features

generated by SIFT and SURF and the invariances they possess.

6.1 Rotation Invariance

1) The first 500 object images from the dataset are picked and used for testing. Rotated atdifferent angles in a clockwise direction to generate 500 trained images.

2) imrotate function of MATLAB with bilinear interpolation is used for performing rotations.

All the algorithms are robust.

6.2 Illumination Invariance

500 test images are brightened by adding or subtracting a number of pixel value offsets (i.e. every

pixels red, green and blue channel intensities are incremented equally).

The sum is clamped to be within 0-255. The offsets tested are: 50, 70, 100, 120,30,50,70,90.


29/32

28

32 and 64 word length SIFT does not work well on a heavily brightened image. SURF and SIFT 96 and

128 is still robust

6.2 Noise Invariance

imnoise function of MATLAB and Gaussian, salt and pepper and speckle to generate the trained data.

1. Gaussian 2 = 0.1 and 2 = 0.22. Salt and pepper noise density of 15% and 35%3. Multiplicative white noise with mean 0 and 2 = 0.04

Interestingly, SIFT(32) does not work well with salt and pepper noise. SURF is still robust as well as

longer word length SIFT.

6.3 Scale Invariance

500 objects from the dataset for which there were two images available at different scales (i.e. a

total of 1000 images).


30/32

29

Interestingly, SURF does not work well here. This is probably due to the approximation of the

Gaussian kernel with the box kernels. SIFT(32) and above does well here. However its also

computationally much slower than SURF.

6.4 Viewpoint Invariance

500 objects from the dataset for which there were two images available in the dataset taken from

different viewpoints i.e. a total of 1000 images.

The first object image is used for training while the other is used for testing.

SIFT outperforms SURF here but not at a large margin.

6.5 Matching


31/32

30

SURF takes the longest time for matching. This is surprising since SURF a faster algorithm. This is

explained since SURF also produces more features. Matching these features therefore takes more

time.

Using shorter descriptor takes shorter time for SIFT.

Average matching time required to match one query image by all descriptors


32/32

7.CONCLUSIONWe have started by looking at the different type of invariance that is useful for features to have. We

then study on why the Gaussian kernel is the unique kernel for deriving a scale space representation

given some logical assumptions.

We also identify a whole class of operators in first order gauge coordinates that are orthogonal

invariant and emphasizes on useful features such as corners and ridges.

With the idea of scale space and operator with certain invariances, we then go through the

mechanics of the SIFT and SURF algorithm to identify the computation used to achieve the

keypoints.

Lastly, we look at the performance of SIFT and SURF generated features and find that they are

indeed invariant to typical real world transformations discussed in the first chapter.

ism in ntu

Documents