ism in ntu

Upload: digiblood

Post on 04-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 ISM in NTU

    1/32

    Scale

    Space

    Methods

    in Feature

    ExtractionISM ReportIsa Nasser - [2012]

    Examiner - Prof Jiang Xudong

    Matric no - G1003149F

  • 7/29/2019 ISM in NTU

    2/32

    1

    1. Introduction .................................................................................................................................... 2

    1.1 Flow of this report ..................................................................................................................... 2

    2. Types of Invariance ......................................................................................................................... 4

    2.1 Illumination ............................................................................................................................... 4

    2.2 Scale .......................................................................................................................................... 4

    2.3 Rotation ..................................................................................................................................... 5

    2.4 Affine ......................................................................................................................................... 5

    2.5 Full Perspective ......................................................................................................................... 6

    3. Gaussian Kernel Derivation ............................................................................................................. 7

    3.1 Axiomatic Derivation of Gaussian Kernel .................................................................................. 7

    3.2 Scale space from Causality ...................................................................................................... 10

    3.3 Properties of the Gaussian Kernel .......................................................................................... 12

    4. Differential Structure of Images ................................................................................................... 14

    4.1 Motivation ............................................................................................................................... 14

    4.2 Isophotes ................................................................................................................................. 14

    4.3 Flowlines ................................................................................................................................. 14

    4.4 Coordinate Systems and Transformation ............................................................................... 14

    4.6 Image transformation and Special transformation ................................................................ 15

    4.5 Invariance ................................................................................................................................ 15

    4.6 First Order Gauge Coordinates ............................................................................................... 16

    5. Feature Extraction Example: SURF and SIFT ................................................................................. 18

    5.1 Properties of SIFT and SURF .................................................................................................... 18

    5.2 Mechanics of SIFT and SURF ................................................................................................... 18

    6. Performance of SURF and SIFT ..................................................................................................... 27

    6.1 Rotation Invariance ................................................................................................................. 27

    6.2 Illumination Invariance ........................................................................................................... 27

    6.2 Noise Invariance ...................................................................................................................... 28

    6.3 Scale Invariance....................................................................................................................... 28

    6.4 Viewpoint Invariance .............................................................................................................. 29

    6.5 Matching ................................................................................................................................. 29

    7. Conclusion ..................................................................................................................................... 31

  • 7/29/2019 ISM in NTU

    3/32

    2

    1.INTRODUCTIONFeatures are a description of an image or part of an image which are invariant to one or many

    transforms. These transforms must describe the changes we observe when taking two different

    pictures of the same scene. The human brain is able to recognize objects that appear in both

    pictures intuitively. This intuition is not easily derived mathematically since we cant program a

    computer to do that same recognition without first studying this subject and modelling it.

    For a computer to recognize the objects between the two pictures, it will have to identify certain

    features which occur in the two pictures. The two pictures can be seen as 1 picture with a clone

    which has some information removed, some information added as well as existing information going

    through a transform.

    What we want to derive is how to identify the existing information in picture 1 and transform it in a

    way that will make it not change when going through the second transform when the second picture

    is taken. Not all the information of the first picture has the required properties to make it notchange. Not all of the information with this property is captured in the second picture since the

    camera might not physically capture it or it goes through some transform which its not invariant to.

    1.1 Flow of this report

    This report will be in a top down approach. In the first part, we will see the different type of

    invariance that is useful for a feature to have. We will start with the understanding of why the

    Gaussian Kernel is used to derive the scale space of an image. We then discuss on feature generating

    functions and how we might approach discovering these functions by using gauge coordinates.

    We will then run through the mechanics of two popular algorithms, SIFT and SURF for modern

    techniques to achieve feature extraction using the image scale space. Finally we end by comparing

    the performance of both the algorithms.

  • 7/29/2019 ISM in NTU

    4/32

    3

    Performance SURF and SIFT

    Two Example Algorithms: SURF and SIFT

    How to Generate Features

    Why Gaussian Kernel ?

    Types of Invariances

  • 7/29/2019 ISM in NTU

    5/32

    4

    2.TYPES OF INVARIANCEThis section will list out the desired invariances that are generally desirable for a descriptor. These

    transformations occur in real life when capturing an image using a camera. Their effects can be seen

    intuitively with the photographic examples.

    2.1 Illumination

    FIGURE 2.1: IMAGES WITH DIFFERENT ILLUMINATION

    Illumination occurs when there are different light intensities in the images (figure 2.1) causing one to

    be lighter than the other. It gets more difficult when there are multiple illuminants in the scene

    which will cause the lighting at various angles to be different. This causes problems in identifying

    features since the intensities of the pixels used as features will change solely due to the illumination

    changes.

    2.2 Scale

    FIGURE 2.2: IMAGES WITH DIFFERENT SCALES. LEFT IMAGE IS ZOOMED OUT WHILE RIGHTIMAGE IS ZOOMED IN

    The scaling transformation can be seen as a convolution at a more zoomed in image with a

    Gaussian Kernel. The amount of scaling is defined by the standard deviation parameter of the

    Gaussian kernel. This will be derived later. Also there are two scales to define in this image. The

  • 7/29/2019 ISM in NTU

    6/32

    5

    inner scale is the size of the smallest details that the image is describing relative to another image of

    the same scene. That is, how zoomed in the image is when it is taken.

    The second is the outer scale, which is how large an area the image is describing compared to

    another image of the same scene. For example figure 2.2, the left image has a smaller inner scale but

    larger outer scale as compared to the right image.

    2.3 Rotation

    FIGURE 2.3: IMAGES WITH A ROTATIONAL TRANSFORMATION

    A rotation can be described with the following transformation matrix.

    Having invariance to rotation is desirable since we may roll the camera when taking an image (figure

    2.3).

    Many features are invariant to rotation, one of which is the Laplacian which is defined as,

    2 2

    2 2

    ( , ) ( , ).( )

    L x y L x yL

    x y

    2.4 Affine

    FIGURE 2.4: IMAGES WITH AFFINE TRANSFORMATION

    The affine transformation can be described with the following linear transformation,

  • 7/29/2019 ISM in NTU

    7/32

    6

    ' 1

    '

    x a bx y e f

    y c dad bc

    We can intuitively visualize this transformation as skewing in either or both the horizontal and

    vertical axes as in figure 2.4.

    2.5 Full Perspective

    When we view an image, our vision is not in the Cartesian coordinates. The parallel horizontal lines

    on the floor does not align in our vision, they will incline towards each other and intersect each

    other in the horizon. This is because the image undergoes a transformation from the 3 dimensional

    Cartesian coordinates to the 4 dimensional homogeneous coordinates.

    The transformation matrix that maps the Cartesian coordinate to the 2d image captured by the

    camera requires information on the camera. Assuming the camera used for capturing the image canbe modelled as a pinhole camera model with focal length f, the camera matrix that transforms

    physical 3 dimensional Cartesian coordinates (which we have good intuition in) to the 2D image

    captured is,

    y is in the 2 dimensional camera image coordinate while x is in the Cartesian coordinates of the 3

    dimensional physical scene. Therefore when we move x1, x2 and x3 in two image captures we will

    transform y1 and y2 of the two images. This distortion is the full perspective transformation which we

    can see in figure 2.5.

    FIGURE 2.5: IMAGES WITH DIFFERENT PERSPECTIVE DUE TO PROJECTION PLANE

  • 7/29/2019 ISM in NTU

    8/32

    7

    3.GAUSSIAN KERNEL DERIVATIONWhen we look at an image, the lens in our cornea will sample the image not only pixel by pixel but

    through different scales. Our mind cant process every pixel to identify whats in our view. We haveevidence for this since after looking at an image, it is impossible for us to even decently reproduce

    the whole image on a canvas with the inner scale of our choice.

    It shows that our eyes look at an image and take pictures at different blurs or aperture size. This is

    useful. The popular example is whether we would like to observe a tree or its leaves. When looking

    at a panorama of trees, we obviously dont need the information of the tiny inner scale.

    It happens that this blurring is through a Gaussian kernel. The choice of Gaussian kernel is purely due

    to a few very intuitive constraints or assumptions we make of our view when not knowing what it is.

    FIGURE 3.1: GENERATING SCALE SPACE BY CONVOLUTION WITH GAUSSIAN KERNEL

    The variance of the Gaussian kernel is also called the scale of the resulting image. A feature which

    can be extracted in all the scales are called scale invariant. The idea is that this feature will appear

    even when we capture a photo of it at a distance; in a larger scale. Having this invariance is obviously

    useful since what we perceive as a feature in our minds can be detected in all scales.

    Why do we use the Gaussian kernel a scale space? This part explains in an intuitive manner why the

    Gaussian kernel is the unique kernel for defining different scales of an image.

    3.1 Axiomatic Derivation of Gaussian Kernel

    We would like to derive the set of mathematical equations to describe the act of our eyes focusing

    on an image in scale space. When we focus on scenery, we are looking at the scenery through our

    eyes with a certain aperture width. This aperture width is how we focus and what gives the image

    scale. This aperture width is in meters.

    Theres also the spatial coordinate of the objects which is w, meters in Cartesian coordinates or

    radians in frequency domain. Lastly, we need light to see these objects. L 0 is the luminance of this

  • 7/29/2019 ISM in NTU

    9/32

    8

    scenery while L is the luminance as we perceived it when it enters through the aperture. With those

    definitions, we can analyse these units using dimensional analysis.

    s w L0 L

    Meter 1 -1 -2 -2

    Candela 0 0 1 1

    1 1 2 2

    0 0 1 1A

    Find the null space x:

    0

    0 0 1 1

    1 1 0 0

    T

    A x

    x

    Using Buckinghams pi theorem,

    0 0 1 1 1 1 0 0

    0 0

    0

    ( )

    ( )

    L L G L L

    LG

    L

    We have to determine the aperture function G. The strategy to do this is to subject the function toconstraints or axioms we know we have to assume when focusing using an aperture.

    The function should not have any preference for any location and should be linear. This leads to

    convolution where the function is swept across every location without any preference.

    0

    0 1 0 1

    ( , )( )

    ( , )

    ( , ) ( , ) ( , ) ( , ) ( , )

    L x yG

    L x y

    L x y L x y G x y L u v G x u y v dudv

    This motivates us to look at the variables in the frequency domain where the convolution operator is

    replaced by a simple multiplication,

    0 1( , ) ( , ) ( , )x y x y x yL L G

    Next is the constraint of isotropy which loosely means the function has to be the same in all

    directions. This means that we dont have to account for frequency of the horizontal and vertical

    axis in the Cartesian coordinate. We have to only account for the distance.

    2 2

    x y

  • 7/29/2019 ISM in NTU

    10/32

    9

    0 1( ) ( ) ( )L L G

    Next is the constraint of scale invariance. When we observe multiple times using the aperture, we

    observe using a wider or narrower aperture but the aperture function doesnt change.

    1 2 1 2( ) ( ) ( )G G G

    The aperture is applied twice as two observations are made, in the frequency domain, this is a

    multiplication. The resulting observation should be equivalent to observing using an aperture with a

    wider diameter. The only general solution to this equation is the exponential function with a

    multiplier . This multiplier is an unknown constant.

    1 2 1 2exp( ) exp( ) exp( ( ))

    However since are dimensionless parameters we derived earlier, we have to raise the power of

    this parameter by another unknown constant p.

    1 2 1 2exp(( ) ) exp(( ) ) exp( (( ) ( ) ))p p p p p

    In short,

    ( ) exp(( ) )pG

    To find the unknown constant, we apply the isotropy constraints again on this new equation. The

    dimensions are independent and thus separable.

    1 1 2 2 3 3( ) ( ) ( ) ... ( )

    I Ie e e e

    ie is the ith basis unit vector. Since the dimensions are in the spatial domain and we are using

    Cartesian unit vectors as the bases, to find the length , we have to use Pythagoras from

    projections along ie .

    2 2 2

    2 2 2 2 2 2

    1 2 1 1 1 2 2 2

    2 2 2 2 2 2 2 2

    1 2 1 1 1 2 2 2

    2 2 2 2 2 2 2 2 2

    1 2 1 2 1 2

    ( ) ( ) ( )

    1,( ) ( ) ( ) ( ) ( ) ( )

    2,

    ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

    ( ) ( ) ( )

    x y z

    x y z x y z

    x y z x y z

    x y z

    p

    p

    We can immediately see that p=2 will cause the components to add and therefore is the correct

    value for the constant. Now we find .

    The next constraint is that when we open the aperture fully, we blur everything to the point theres

    no information. Mathematically we can represent this as,

  • 7/29/2019 ISM in NTU

    11/32

    10

    2 2

    lim ( ) 0

    lim exp( ( ) )

    G

    This can be true if2 is negative.

    Also, we know that2 has to be real. We shall make the assumption that

    2 1

    2 . It happens that

    this cause the aperture function to be a solution to the diffusion equations. So far we have,

    21( ) exp( ( ) )

    2G

    Now going back to the spatial domain, we shall perform an inverse Fourier transform,

    2

    2

    1 1 2

    exp21

    ( , ) ( ( )) (exp( ( ) ))2

    x

    g x F G F

    Finally, the last constraint is that the aperture function is normalized such the integration of the

    function across all spatial dimension is equals to 1. To do this, we integrate ( , )g x and divide it by

    that value.

    2

    2exp

    22

    x

    dx

    The final result is therefore,

    2

    1 22

    ( , ) 1( , ) exp

    22 2

    g x xg x

    This is the Gaussian kernel. It shows that the Gaussian function is the unique aperture function or

    kernel given the constraints or axioms we deduce from our understanding of the process of visual

    observation.

    3.2 Scale space from Causality

    This is from Koenderink paper The structures of images [Koenderink1984a]. The argument in this

    paper is simple but rather useful in deriving the Gaussian kernel:

    When we increase the scale and blur the image further, we have a situation that the final blurred

    image is causedby the image we started from. That is, we cant introduce new type of information

    or structures, as its called in the paper, into the image. A higher scale an image must always contain

    less structure than a lower scale of the same image. It is not possible that new structures are formedsince we are just observing.

  • 7/29/2019 ISM in NTU

    12/32

    11

    This is illustrated clearly in the 1D case in figure 4.1,

    FIGURE 3.2: LESS STRUCTURE MEANS THE PEAK WILL HAVE TO DECREASE WITHOUT NEW PEAKS FORMING

    The arrows illustrate the direction of the 1-D signal array as the scale is increased. The signal should

    get smoother and new structures such as a new peak should not form. The arrow is therefore the

    only possible outcome for this condition to be satisfied.

    Describing this mathematically,

    2

    20, 0

    du d u

    dt dx , decrease intensity for all u with negative curvature in the signal array (described

    by arrow down in figure 4.1)

    2

    20, 0

    du d u

    dt dx

    , increase intensity for all u with positive curvature in the signal array (described by

    arrow up in the figure 4.1)

    To summarize this,

    2

    20

    du d u

    dt dx

    The next axiom we have to impose is that the rate of change of the intensity of a particular location

    when we rescale the signal must be proportional to the curvature of that intensity spatially. This

    implies the function that relatesdu

    dtand

    2

    2

    d u

    dxis not only linear, they are proportional.

    2

    2

    d u du

    dx dt

    We can adjust 1 since we can define the physical measure of the amount we scale or timeelapsed for a given numerical value.

    2

    2

    d u du

    dx dt

  • 7/29/2019 ISM in NTU

    13/32

    12

    This is the linear isotropic diffusion equation. The Green function of this equation is the Gaussian

    kernel. It means that any function (for example an image) for which diffusion is applied, is convolved

    with the Gaussian kernel. We can see this since the following relation is true,

    Let2 2t , that is, the variance of the Gaussian kernel becomes time for the diffusion process. We

    can first agree to this intuitively since as time increase the variance of the kernel used for blurring

    should increase since the function gets more and more diffused.

    2

    22

    1exp

    44

    x

    tt

    2 2 2

    2

    1 1exp exp

    4 44 4

    d x du x

    dx t dt t t t

    , this turns out to be true.

    This can be extended to the 2 dimensional case which will not be elaborated in this report.

    3.3 Properties of the Gaussian Kernel

    Normalized Kernel Average Grey level Invariance

    2

    22

    1exp

    22

    x

    The constant2

    1

    2 in the Gaussian kernel is the normalization constant. It causes the integration

    of the Gaussian kernel over all x to be unity since,

    22

    2exp 2

    2

    xdx

    Since the area underneath the kernel is constant, increasing the width or the variance of the kernel

    will both widen and shorten the kernel. That is, increasing the variance will decrease its amplitude.

    Normalization ensures that the average grey levels of the image remains the same after blurring.

    This property is known as average grey level invariance.

    Self-Similarity Cascade Property

    When we convolve two Gaussian functions with different variance, the resulting function is a

    Gaussian with the variance of the addition of the two variances,

    2 2 2 2

    1 2 1 2

    ( ; ) ( ; ) ( ; )G x G x G x

  • 7/29/2019 ISM in NTU

    14/32

    13

    This is a useful property since convolution with a Gaussian kernel twice, will produce the same effect

    of convolution with a wider Gaussian kernel with twice the variance.

    The Scale Parameter

    In order to avoid addition of squares when cascading kernels, substitute

    22 t

    To give us a neater N-dimensional Gaussian kernel,

    2

    1

    1; exp

    N

    xG x t

    tt

    To give an even neater kernel we parameterize the X axis,

    xx

    t

    211

    ; exp

    N

    G x t x

    t

    We call this particular re-parameterization of the coordinates, the natural coordinate. When

    increasing or decreasingx

    we are moving in steps of2

    1

    2 . This parameterization ofx removes

    the variance or scale factor (aperture width) from the spatial coordinates.

    If we limit the scale factor is to infinity, the blurred image will have homogenous intensity. It will

    have average intensity of the image before blurring.

    Relation to Generalized Functions

    Taking the limit of the scale factor to zero will produce a Dirac delta function,

    2

    220

    1lim exp ( )

    22x

    xx

    The cumulative distrivution function for the Gaussian kernel is the error function,

    2 2

    22 2

    1 1exp

    2 22 2

    x x xerf

  • 7/29/2019 ISM in NTU

    15/32

    14

    4.DIFFERENTIAL STRUCTURE OF IMAGES4.1 Motivation

    Given a 2D image L, the transformation

    22L L

    x y

    is a good edge detector.

    2 22 2 2

    2 22

    L L L L L L L

    y x x y x y x y

    is a good corner detector. An obvious question would

    be how these expressions are produced and whether theres a method to constraint the possible

    differential expressions that can produce features on the image. This section will explore a strategy

    to isolate a set of such expressions that describes useful structures on the image.

    4.2 Isophotes

    We can look at intensities of grayscale images as heights; the higher the intensity, the taller the

    height and vice versa. The image can be divided into contiguous regions of the same height.

    Isophotes describes the closed curves/lines which outlines these regions. All isophotes together

    contains all the information in the image.

    Properties of Isophotes

    1. Isophotes are closed curves. Most isophotes in 2d are non-self-intersecting planar curvescalled Jordan curves.

    2. Isophotes can intersect itself in critical case.3. Isophotes do not intersect other isophotes.4. Shape independent of grayscale transform such as changing brightness and contrast

    A special class of isophotes occur when it goes through a singularity through a maximum, minimum

    or saddlepoint. An isophote can only intersect itself when it goes through a saddlepoint.

    4.3 Flowlines

    Flowlines are lines everywhere perpendicular to the isophotes. Flowlines are dual of isophotes and

    isophotes are dual of flowlines. One can be calculated from the other.

    4.4 Coordinate Systems and Transformation

    The set of derivatives taken at a particular location can be thought of as a language for us to define a

    certain local feature. We can create expressions of derivatives of any order and combination. For

  • 7/29/2019 ISM in NTU

    16/32

    15

    example using Taylor series, we can take derivatives of increasing orders of a local point and multiply

    it with a small offset to describe the structure at the slight offset.

    4.6 Image transformation and Special transformation

    An image transformation is described by a transformation matrix. For example, the rotational

    transformation matrix is:

    When an image is transformed, the volume is often changed and the density is distributed over

    different areas. To calculate this change in volume we can calculate the Jacobian matrix.

    'x is the transformed image coordinates

    x is the original image coordinates

    Looking at the change of an infinitesimally small volume:

    We can see that the determinant of the Jacobian matrix is the factor for which the volume changed.

    If the determinant is 1, we call the transformation a specialtransformation.

    4.5 Invariance

    When looking for features on an image, we would like it to be detected no matter if the image is

    rotated or translated. Its quite obvious how this would be necessary since features dependent on

    orthogonal orientation will be rare and not repeatable in detection for multiple practical

    photographs with different tilts. If they need to be very specialto be detected, it cant be called a

    feature. This is why we need the features to be invariant to rotation and translation.

  • 7/29/2019 ISM in NTU

    17/32

    16

    Example

    is the gradient magnitude. Using Mathematica to simplify the equation:

    The function doesnt change after rotation and thus rotational invariant.

    4.6 First Order Gauge Coordinates

    The usual global coordinates (X and Y Cartesian coordinates) would change its orientation relative to

    the image orientation when any orthogonal transformation is applied. Creating features generating

    functions using this coordinate will be difficult since the function would not be invariant to these

    transformation most of the time.

    If we can localize the coordinates such that it takes its position from properties of the location that

    would be invariant under orthogonal transformations, we can then create functions using thiscoordinates which would in turn be invariant. This is localize coordinates are the gauge coordinates.

    The maximal change of intensity is the gradient,

    , ,L x y L x yw

    x y

    Now to find the orthogonal coordinate we rotate this vector by 90 degrees clockwise,

    , , , ,0 11 0

    L x y L x y L x y L x yvx y y x

    Since the gauge coordinates is localized and can be derived from the intensity of the point, any

    derivative expressed in gauge coordinates is an orthogonal invariant.

    For example we differentiate the intensity matrix w.r.t. w. This will give us,

    ,L v ww

    w

    This will be the gradient at the point which is w itself. w is an invariant.

  • 7/29/2019 ISM in NTU

    18/32

    17

    Now we trydifferentiating the intensity matrix w.r.t. v.

    ,0

    L v w

    v

    This is obvious since the gradient is pointing to w which is perpendicular to v. Zero is also an

    invariant.

    It is useful to express these invariants in Cartesian coordinates instead of gauge coordinates for us to

    identify these points easily in a computer.

    The family of features that can be derived from the partial derivatives of the first order gauge

    coordinates can be displayed as such:

    2 , , ,f Gauge D L x y k l

    Here we start to see some of the famous invariant features:

    TABLE 4-1: FEATURES WHICH ARE FUNCTIONS OF PARTIAL DERIVEATIVES OF FIRST ORDER GAUGE COORDINATES

    Operator in Gauge Coordinates Description

    vvL Ridge Detector

    vv

    w

    L

    L

    Isophotes

    2

    vv wL L Corner Detection

    3

    vv

    w

    L

    L

    Affine invariant corner

    detection

  • 7/29/2019 ISM in NTU

    19/32

    18

    5.FEATURE EXTRACTION EXAMPLE:SURF AND SIFT5.1 Properties of SIFT and SURF

    Invariant to

    1. Uniform Scaling2. OrientationPartially invariant to

    Affine distortion Illumination

    5.2 Mechanics of SIFT and SURF

    SIFT and SURF can be separated into 3 processes:

    a. Identifying keypoints (Step 1 to 3)b. Subtracting weak keypoints (Step 3)c. Attaching orientation of keypoint and packing it in a data structure (Step 4)

    Step 1: Create Scale Space out of the image

    SIFT

    SURF

    In SURF this is convolution is estimated and for computation efficiency. It involves first generating

    the integral image which is just the cumulative integration of the image.

    http://en.wikipedia.org/wiki/Scaling_%28geometry%29http://en.wikipedia.org/wiki/Orientation_%28geometry%29http://en.wikipedia.org/wiki/Affine_transformationhttp://en.wikipedia.org/wiki/Affine_transformationhttp://en.wikipedia.org/wiki/Orientation_%28geometry%29http://en.wikipedia.org/wiki/Scaling_%28geometry%29
  • 7/29/2019 ISM in NTU

    20/32

    19

    Followed by the convolution of this image with a box filter, which is defined as,

    The computation of the convolution can be done in just 3 operations. This is assuming the integral

    image is prepared.

    FIGURE 5.1: INTEGRAL IMAGE WITH BOX FILTER CONVOLUTION FORMULA

    The variance of the box filter with a square support of is

  • 7/29/2019 ISM in NTU

    21/32

    20

    Figure: Figure:

    Figure: Figure:FIGURE 5.2:COMPARISON OF GAUSSIAN BLUR AND ITS ESTIMATE USING SURF

    There are some visible vertical artefacts on the approximation, but it is quite close to the Gaussian

    blur.

  • 7/29/2019 ISM in NTU

    22/32

    21

    Step 2: Compute the Second Order Derivative of the Scale Space

    Scale-invariant features may be detected by using scale-normalized second order derivatives on the

    scale space representation.

    SIFT computes the scale space and aims to have its second order derivative which is its Laplacian.

    SURFdoesnt compute the Scale Space of the Gaussian Kernel.

    Instead it computes the scale space of estimate of the second order derivative Gaussian Kernel

    directly.

    SIFT: Laplacian

    The Laplacian of the Gaussian is simply the second derivative of the images. The Laplacian of an

    image identify the edges. It looks like the following:

    2 , ,L x y t

    Blurring the image smooth it out which makes the edge detection more stable and effective.

    Computing the Laplacian is computationally intensive. SIFT tackle this by estimating it by using the

    difference of Gaussians shown in figure 3.1.

    FIGURE 5.3: DIFFERENCE OF GAUSSIAN APPROXIMATES THE LAPLACIAN OF GAUSSIANS

    2 2 2 2

    1 2 2 2

    1 21 2

    1 1, ; , , exp exp

    2 22 2

    x y x yDoG x y

    It happens that this is a good approximation for the Laplacian of Gaussians.

  • 7/29/2019 ISM in NTU

    23/32

    22

    2

    2 2 2 2

    2 2

    , , , , ,( , ; ) ( , ; ), , ( , ; ( , ;

    1 1

    1 1exp exp

    2 22 2

    1

    G x y k G x y f x yt L x y k L x yL x y t L x y t t L L x y t t

    t k k

    x y x y

    kk

    k

    This is therefore whats used in SIFT.

    SURF: Determinant of Hessian

    To find the Hessian matrix of the scale space of the image, we need to find the second order partial

    derivatives. This is computationally intensive. As an approximation, SURF uses 2D Haar wavelets to

    approximate this.

    The convolution of these kernels can be computed very quickly.

  • 7/29/2019 ISM in NTU

    24/32

    23

    And

    Where

    1

    2

    3

    4

    [0, ] [0, ],

    [ , 0] [0, ],

    [0, ] [ , 0],

    [ , 0] [ , 0]

    l l

    l l

    l l

    l l

    The division with above is for normalization due to the variance of the Kernel.

    r() is the ratio of the Frobenius norms(square root of the sum of the squares of each coefficients of

    a matrix) of the approximated and exact Hessian of Gaussian kernels. This is of course precalculated.

    Step 3: Feature Selection

    SIFT: Find the minima/maxima of sub pixels in DoG

    Sub pixels refer to interpolation between pixels. This is useful since the extrema often do not lie

    exactly on a pixel. To obtain this, take the Taylor series at a point over 3 dimensions; the x and y aswell as the scale space .

  • 7/29/2019 ISM in NTU

    25/32

    24

    2

    2

    1

    2

    TTD D

    D x D x x xx x

    Where x is the offset in the 3 dimensions (x,y,) and D is the 3 dimensional scale space centred on

    the candidate. Taking the first derivative of this and setting it to zero will give us the offset x which might be potential maxima. To check if the increase is significant, we check that

    , ,

    0.5

    opt opt optx x y

    D x

    If its more than 0.5, use it for the next candidate point and find the sub pixel offset by interpolation

    using Taylor series.

    If the final maxima/minima is below 0.03 in magnitude, remove it as candidate keypoint. Otherwise

    add is as candidate keypoint.

    SIFT: Remove low contrast Edges

    This step removes keypoints that are either of low intensity or lying on an edge.

    Removing low intensity is simply comparing the intensity with designed threshold intensity. If its

    lower than the threshold, remove the keypoint.

    Keypoints near edges has a high value after DoG although they might not be robust to small noise.

    To remedy this, we remove features that have high edge response but poorly determined locations.

    Poorly defined peaks in the DoG have a low principal curvature along the edge compared to the

    principal curvature across it. The principal curvature is proportional to the eigenvalue of the Hessian,

    which is the second order partial derivative of the point. The ratio of the two eigenvalues will

    determine of the feature should be kept.

    1 2

    ( , ) ( , )

    ( , ) ( , )

    ,

    /

    L x y L x y

    x x x yH

    L x y L x y

    y x y y

    AH H

    r

    Instead of using the above computation, a less computationally intensive computation would be to

    find:

    21 ( ) / det( )r

    Tr H H r

    We would then compare this to a threshold rth which is equals to 10.

  • 7/29/2019 ISM in NTU

    26/32

    25

    1 10 11.1

    10

    th

    th

    r

    r

    If it is less than this threshold, the keypoint is removed.

    SURF: Feature Selection

    local maxima are selected by iterating through a 3x3x3 neighborhood of every point. This

    exhaustively compares every point of the scale-space with its 26 nearest-neighbors.

    For all local maxima, a threshold is applied to eliminate local maxima with low DoH value:

    Step 5: Assign Keypoint Orientation

    This step is similar to the gauge coordinates concept introduced later. To make the feature rotation

    invariant, the gradient of the feature can be used to identify the rotational orientation of the feature

    in different images. The magnitude of the gradient of a point, and subsequently its angle can be

    calculated using the following formulas.

    The gradient angle of every pixel in an area of the size of the Gaussian kernel around the feature is

    calculated. The angle of every pixel is collected in one of the 36 bins. Each bin covers a range of 10

    degrees of the full 360 degrees. The magnitude of the gradient m is added to the bin.

    The largest bin value will be assigned in index form (1 36) to the keypoint. Also, any peak that is

    80% or more of the highest peak is assigned to the keypoint to create new keypoints.

    Step 6: Generate SIFT Features

    16X16 window around the keypoint which is broken down to 16 4x4 parts (figure 3.2).

  • 7/29/2019 ISM in NTU

    27/32

    26

    FIGURE 5.4: GENERATING KEYPOINTS BASED ON GRADIENTS OF 4X4 WINDOWS

    For every 4X4 window, the angle of the gradient and its magnitude is calculated and collected in 8

    bins of 45 degrees (figure 3.3) each in the similar manner as before. This time however, the gradient

    magnitude is weighted using a Gaussian weighting function where the further gradient contributes

    less while the gradient nearer to the features contributes more. Do this for all the 4x4 parts and we

    will have 4x4x8 numbers to describe the orientation of the keypoint.

    FIGURE 5.5: STORING GRADIENTS IN 8 BIN 45 DEGREES APART

  • 7/29/2019 ISM in NTU

    28/32

    27

    6.PERFORMANCE OF SURF AND SIFTThis is from the work of N Khan, B McCane, G Wyvill. It will be useful to see how robust the features

    generated by SIFT and SURF and the invariances they possess.

    6.1 Rotation Invariance

    1) The first 500 object images from the dataset are picked and used for testing. Rotated atdifferent angles in a clockwise direction to generate 500 trained images.

    2) imrotate function of MATLAB with bilinear interpolation is used for performing rotations.

    All the algorithms are robust.

    6.2 Illumination Invariance

    500 test images are brightened by adding or subtracting a number of pixel value offsets (i.e. every

    pixels red, green and blue channel intensities are incremented equally).

    The sum is clamped to be within 0-255. The offsets tested are: 50, 70, 100, 120,30,50,70,90.

  • 7/29/2019 ISM in NTU

    29/32

    28

    32 and 64 word length SIFT does not work well on a heavily brightened image. SURF and SIFT 96 and

    128 is still robust

    6.2 Noise Invariance

    imnoise function of MATLAB and Gaussian, salt and pepper and speckle to generate the trained data.

    1. Gaussian 2 = 0.1 and 2 = 0.22. Salt and pepper noise density of 15% and 35%3. Multiplicative white noise with mean 0 and 2 = 0.04

    Interestingly, SIFT(32) does not work well with salt and pepper noise. SURF is still robust as well as

    longer word length SIFT.

    6.3 Scale Invariance

    500 objects from the dataset for which there were two images available at different scales (i.e. a

    total of 1000 images).

  • 7/29/2019 ISM in NTU

    30/32

    29

    Interestingly, SURF does not work well here. This is probably due to the approximation of the

    Gaussian kernel with the box kernels. SIFT(32) and above does well here. However its also

    computationally much slower than SURF.

    6.4 Viewpoint Invariance

    500 objects from the dataset for which there were two images available in the dataset taken from

    different viewpoints i.e. a total of 1000 images.

    The first object image is used for training while the other is used for testing.

    SIFT outperforms SURF here but not at a large margin.

    6.5 Matching

  • 7/29/2019 ISM in NTU

    31/32

    30

    SURF takes the longest time for matching. This is surprising since SURF a faster algorithm. This is

    explained since SURF also produces more features. Matching these features therefore takes more

    time.

    Using shorter descriptor takes shorter time for SIFT.

    Average matching time required to match one query image by all descriptors

  • 7/29/2019 ISM in NTU

    32/32

    7.CONCLUSIONWe have started by looking at the different type of invariance that is useful for features to have. We

    then study on why the Gaussian kernel is the unique kernel for deriving a scale space representation

    given some logical assumptions.

    We also identify a whole class of operators in first order gauge coordinates that are orthogonal

    invariant and emphasizes on useful features such as corners and ridges.

    With the idea of scale space and operator with certain invariances, we then go through the

    mechanics of the SIFT and SURF algorithm to identify the computation used to achieve the

    keypoints.

    Lastly, we look at the performance of SIFT and SURF generated features and find that they are

    indeed invariant to typical real world transformations discussed in the first chapter.