maximum detector response markers for sift and · pdf filemaximum detector response markers...

10
Maximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard Steinbach, Nassir Navab Technische Universit¨ at M ¨ unchen Email: {florian.schweiger, eckehard.steinbach, schroth}@tum.de [email protected] {georgel, navab}@in.tum.de Abstract In this paper, we introduce optimal markers to be used with the SIFT and SURF feature detectors. They can be applied to trigger the detection of fea- ture points at desired locations. Unlike conventional marker systems, we do not propose a standalone so- lution comprising a set of markers and a thereto adapted detection algorithm. Instead, our mark- ers are adapted to existing and established detec- tors. In particular, we introduce markers optimally suited for SIFT and SURF. We derive the optimal design and show their high detectability within a wide range of different imaging conditions in ex- periments on both synthetic and real data. 1 Introduction In the field of Augmented Reality, the use of so called fiducial markers is common practice in order to obatain reliably extractable 2D-3D correspon- dences. Subsequent tasks such as pose estimation or object recognition are hence vastly simplified. Due to their versatility and low installation costs, fidu- cial markers are often chosen over tracking systems employing more costly infra-red or active markers. Example applications from other fields include vi- sual servoing in robotic surgery and monitoring in production logistics. In many applications, fiducial markers are sup- plemented with image features originating in salient points in the scene itself. Many algorithms exist for feature extraction. Maybe the most widely used are SIFT and SURF, two closely related algorithms that yield scale and rotation invariant image feature points. At the other end of the scale, in merely feature-based applications, it might even so be de- sirable to place some reliably detectable reference Figure 1: Our SIFT (left) and SURF markers. points on unstructured surfaces, which would usu- ally not be detected. In this work, we propose a light-weight marker framework that conflates the 2-stage strategy con- sisting of marker detection and feature point extrac- tion into a more efficient 1-step approach. We aim at the vast number of computer vision applications which are based on feature points, enhancing them with markers that fully integrate into the existing detection process. The markers we have developed for SIFT and SURF are provably optimal in that they trigger maximum response by the respective detector. The remainder of the paper is structured as fol- lows. In Section 2, we will give a brief overview on existing marker systems and the functioning of the SIFT and SURF feature detectors. Section 3 introduces the optimal markers for both these de- tectors, which are then validated experimentally in Section 4. We conclude the paper in Section 5 giv- ing an outlook on future work. VMV 2009 M. Magnor, B. Rosenhahn, H. Theisel (Editors)

Upload: hoangque

Post on 07-Feb-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

Maximum Detector Response Markers for SIFT and SURF

Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard Steinbach, Nassir Navab

Technische Universitat Munchen

Email: {florian.schweiger, eckehard.steinbach, schroth}@[email protected]{georgel, navab}@in.tum.de

Abstract

In this paper, we introduce optimal markers to beused with the SIFT and SURF feature detectors.They can be applied to trigger the detection of fea-ture points at desired locations. Unlike conventionalmarker systems, we do not propose a standalone so-lution comprising a set of markers and a theretoadapted detection algorithm. Instead, our mark-ers are adapted to existing and established detec-tors. In particular, we introduce markers optimallysuited for SIFT and SURF. We derive the optimaldesign and show their high detectability within awide range of different imaging conditions in ex-periments on both synthetic and real data.

1 Introduction

In the field of Augmented Reality, the use of socalled fiducial markers is common practice in orderto obatain reliably extractable 2D-3D correspon-dences. Subsequent tasks such as pose estimation orobject recognition are hence vastly simplified. Dueto their versatility and low installation costs, fidu-cial markers are often chosen over tracking systemsemploying more costly infra-red or active markers.Example applications from other fields include vi-sual servoing in robotic surgery and monitoring inproduction logistics.

In many applications, fiducial markers are sup-plemented with image features originating in salientpoints in the scene itself. Many algorithms existfor feature extraction. Maybe the most widely usedare SIFT and SURF, two closely related algorithmsthat yield scale and rotation invariant image featurepoints. At the other end of the scale, in merelyfeature-based applications, it might even so be de-sirable to place some reliably detectable reference

Figure 1: Our SIFT (left) and SURF markers.

points on unstructured surfaces, which would usu-ally not be detected.

In this work, we propose a light-weight markerframework that conflates the 2-stage strategy con-sisting of marker detection and feature point extrac-tion into a more efficient 1-step approach. We aimat the vast number of computer vision applicationswhich are based on feature points, enhancing themwith markers that fully integrate into the existingdetection process. The markers we have developedfor SIFT and SURF are provably optimal in thatthey trigger maximum response by the respectivedetector.

The remainder of the paper is structured as fol-lows. In Section 2, we will give a brief overviewon existing marker systems and the functioning ofthe SIFT and SURF feature detectors. Section 3introduces the optimal markers for both these de-tectors, which are then validated experimentally inSection 4. We conclude the paper in Section 5 giv-ing an outlook on future work.

VMV 2009 M. Magnor, B. Rosenhahn, H. Theisel (Editors)

Page 2: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

2 Related Work

2.1 Fiducial Markers

Fiducial markers are widely used in a variety ofdifferent fields. Depending on the particular appli-cation, different designs have been proposed. Anoverview and comparison of different marker sys-tems can be found in [13]. Fiducial marker sys-tems typically consist of a set of distinguishablelabels that are placed in the scene and can be de-tected and decoded by an associated algorithm. Ac-cording to the design of the markers, the detectionconsists of several nontrivial steps such as edge de-tection, linking and line fitting. Decoding refers tothe identification of a detected marker. Early sys-tems used correlation-based approaches to matchthe appearance of a marker against a database oftemplates [4, 5]. State-of-the-art marker systemsemploy binary error correcting codes to allow theunique and robust identification of thousands of dif-ferent markers [3]. Recently, there have been effortstowards lowering the computational complexity formobile real-time applications [12, 11]. Comparedto these highly specialized systems, our proposedmarker scheme offers very elementary, yet highlyvaluable functionality: making feature points reli-ably detectable, at no additional expenses.

Desirable properties of a conventional markersystem are low false positive and false negativerates, as well as low inter-marker confusion ratesif the system comprises more than one distinguish-able marker. That is, the system should neither re-port a detected marker when there is none, nor missor mistake an actually present marker.

As mentioned before, state-of-the-art marker sys-tems like, e. g., ARTag [3] use error correctionmechanisms which allow them to reach unrivaledperformance in terms of false positive and inter-marker confusion rates. As our light-weight markersystem provides only one marker, respectively twomarkers in the case of SURF, inter-marker confu-sion is not an issue. Our main goal is to have ahighly detectable low-cost marker, not necessarilya uniquely detectable one.

The key virtue of our system is its remarkablylow false negative rate. As we demonstrate in Sec-tion 4, it can be virtually guaranteed that our mark-ers get detected even under dramatically varyingimaging conditions. This makes them highly usefulin applications where SIFT or SURF points need to

be found at a desired location.

2.2 Scale Invariant Feature Transformand Speeded Up Robust Features

Both SIFT [7] and SURF [1] belong to the fam-ily of scale invariant feature detectors. As such,they analyse an input image at different resolutionsin order to repeatably find characteristic blob-likestructures independently of their actual size in animage. To this end, both algorithms use multi-scale detection operators to analyze the so calledscale space representation of an image. In a sec-ond step, detected features are assigned a rotation-invariant descriptor computed from the surround-ing pixel neighborhood. The interested reader isreferred to the original publications of SIFT andSURF for more detailed information about their in-ner workings. In the following, we will briefly re-view those parts of both algorithms which are rele-vant for our marker design: the detection operators,and the composition of the descriptor.

Detection Operators It has been shown thatgradually filtering with a Gaussian kernel is par-ticularly suited to build the scale space represen-tation of natural images [6]. Furthermore, theLaplacian operator has proven very usable for in-terest point detection [8]. Consequently, SIFT usesan approximation to the Laplacian of Gaussiansin order to find features on discrete scale levels.This is achieved by convolving the image I withDifference-of-Gaussian (DoG) filters of increasingsize. The SIFT detection filter output reads:

DSIFT(x, y, σk) =(Hk

DoG ? I)(x, y),

where HkDoG = Gσk+1 −Gσk (1)

Here, Gσk is a bivariate Gaussian with standard de-viation σk representing the scale. Figure 2 showsthe typical “flipped mexican hat” shape of Hk

DoG.The discrete scales are chosen appropriately tocover a reasonable range. Features are found aslocal maxima of the filter output in 3-dimensionalscale space. Every feature point hence is a combi-nation of location and scale maximizing DSIFT.

SURF builds on the concepts of SIFT but intro-duces more radical approximations in order to speedup the detection process. Due to the use of inte-gral images the complexity of SURF is greatly re-

Page 3: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

Figure 2: Difference of Gaussians (DoG) filter usedby SIFT.

Figure 3: Box filteres used by SURF to approximatesecond order Gaussian derivatives.

duced, yet, SURF often achieves superior perfor-mance than its predecessor. Instead of the Laplacianoperator, SURF uses the determinant of the Hes-sian for feature detection in scale space. Figure 3shows the box filters Gσxx, Gσyy and Gσxy approxi-mating the second order derivatives of the GaussianGσ . Scales are again discretized and depend on thesize of the used box filters. By definition, a kernelsize of s×s pixels corresponds to σ= 1.2

9s. Apply-

ing these box filters to an image I yields the entriesof the Hessian matrix, and we have the followingdetection operator output:

DSURF(x, y, σ) = det

[H11(x, y) H12(x, y)H21(x, y) H22(x, y)

](2)

where

H11 = Gσxx ? I, H22 = Gσyy ? I,

H12 = H21 = Gσxy ? I

It is important to note here that, as opposed to theDoG filters, the SURF detection operator is nonlin-ear in the input image I .

10 20 30 40 50 60

descriptor entries

20 40 60 80 100 120descriptor entries

Figure 4: Left: The support used to compute thedescriptors in size comparison with the respectivemarkers. Right: The descriptors of the markers.The highlighted entries corresponding to the innerfour subblocks constitute the marker’s signature.

Descriptors After detection, every feature is as-signed a ”fingerprint” computed from the gradientdistribution around its pixel location. The more dis-tinct these descriptors are, the better the results ob-tained in a subsequent feature matching step. In ourcase, descriptors will also be used to identify mark-ers in an image.

The SIFT descriptor is constructed from a squareneighborhood of side length 12σ pixels, where σis the scale of the feature. This neighborhood isaligned with the dominant local gradient direction,and Figure 4 shows its 4×4 subdivision. For eachof the 16 subregions a 8-bin histogram of weightedgradients is built, and finally the descriptor is com-piled by sorting the bins’ contents from all sub-blocks into a vector of length 8×4×4=128. Dueto the scale and rotation adaptive creation process,SIFT descriptors are mostly invariant to moderategeometric transformations.

SURF uses a very similar descriptor layout, alsobased on a square region around the feature pointwhich is aligned with the dominant gradient, andsubdivided into 16 subblocks (see Fig. 4). Theonly differences are that the neighborhood is chosen20σ pixels wide, and that every subblock only con-tributes 4 descriptor entries. Instead of histogramvalues, the sum and absolute sum of the gradient’sx- and y-components are used. In total, the SURFdescriptor hence comprises 64 entries.

Page 4: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

3 Optimal Marker Design

In this section, we derive the optimal input imagesfor the SIFT and SURF detectors, which then lendthemselves as ideal markers for the respective de-tector. In this context, optimal refers to giving riseto the highest possible detector output.

3.1 The SIFT Marker

In case of SIFT, the task of determining the optimalmarker is that of maximizing the output of a linearfilter. As discussed in Section 2.2, this filter is aDoG as given in Equation (1).

In signal processing, matched filters are usedto recover noisy signals of known shape [9]. Amatched filter maximizes the signal-to-noise ratio,and its impulse response is simply the reversed ver-sion of the given signal. In our case, we are givena DoG filter and we search for the energy-limitedsignal which, superimposed with image noise, willyield maximum response at the filter output. Inanalogy to the matched filter design, we conse-quently choose our SIFT marker to be a DoG it-self. Its shape and appearance are shown in Fig-ures 2 and 1. Note that the reversed DoG is a valid“matched signal”, thus marker, as well.

Figure 4 shows the associated descriptor whichcan subsequently be used to identify the marker. Itscharacteristic shape due to the fact that the mainlobes of the DoG fall inside the inner four subblocksis beneficial for descriptor matching. We will referto these parts of the descriptor as the marker’s sig-nature.

3.2 The SURF Marker

SURF on the other hand uses a nonlinear detectionoperator. Hence, the matched filter approach wetook for SIFT is no longer applicable in this case.Nevertheless, we can derive the input image thatwill maximize the detector response by solving thecorresponding optimization problem.

Without loss of generality, we will derive theSURF marker of size 9× 9 pixels here which in-volves 81 variables in our optimization. The prob-lem can of course also be solved for the other ker-nel sizes used by SURF, i. e., 15, 21, 27, and soon. However, as the complexity of the problem in-creases with the square of the filter size, it should

be stated that the resulting markers will just be up-scaled versions of the 9×9 marker. Hence, we settlefor the case s=9 in the following, and consequentlyσ=1.2. Assume we want (2) to attain a maximumat position (x0, y0). Then the 81 pixels in the squareregion centered on (x0, y0) have to be taken intoconsideration. These are the only pixels which areactually covered by the support of the consideredoperator and their values must be adjusted such asto maximize DSURF(x0, y0, 1.2).

First, we arrange our 81 variables into the vec-tor x, and accordingly the entries of the box filtersinto the vectors gxx, gxy and gyy . From (2), thefilter output can then be written as follows.

DSURF(x0, y0, 1.2) =

∣∣∣∣g>xxx g>xyx

g>xyx g>yyx

∣∣∣∣= x> (gxxg

>yy − gxyg

>xy)︸ ︷︷ ︸

G

x

= 12x>(G>+G)x

With A = G>+ G, this leads to the followingquadratic optimization problem if we additionallyrequire that the pixel values be upper bounded.

maxx

x>Ax, s.t. ‖x‖ ≤ 1 (3)

It can be shown that rankA= 3, so the eigenvaluedecomposition A= UΛU> together with the sub-stitution y=U>x yields the equivalent problem:

maxy

y>Λ y = maxy

(λ1y21 + λ2y

22 + λ3y

23),

s.t. ‖y‖ ≤ 1

There are two solutions yopt =[±1, 0, . . . , 0]>and back-substitution reveals that xopt is the eigen-vector corresponding to the largest eigenvalue ofA (and its inverse respectively). Rearranging xopt

into a 9×9 image and adjusting its values to spanthe whole range of gray values eventually leads tothe desired SURF markers, as depicted in Figures 5and 1.

Due to their discrete nature, the SURF markersleave an even more characteristic descriptor signa-ture than their SIFT counterparts (see Figure 4). Itis particularly noteworthy that the signatures of thedark and light versions of the SURF marker are dis-tinguishable. However, the entries corresponding tothe sum of absolute gradients values are identicalfor both.

Page 5: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

Figure 5: Maximum response SURF marker opti-mized for a kernel size of 15 pixels. Note the quali-tative similarity to the reversed Mexican hat of Fig-ure 2.

4 Experiments on Detectability

The markers we have derived in the previous sec-tion are provably optimal only if viewed under per-fect conditions, i. e., fronto-parallel to the camera,upright, and at the exact same size as the operatorthey were optimized for. The experiments we ranon synthetic data suggest, however, that our mark-ers are still detectable if imaged at sizes halfway be-tween scales, and under perspective distortions. Inthis section, we will give the results of our syntheticexperiments, and demonstrate detectability also inreal images. For all our experiments, we used theSIFT implementation by Andrea Vedaldi [10] andthe OpenCV [2] implementaion of SURF.

4.1 Detectability under Distortion

In general, there are two properties related to theperformance of the proposed markers that have tobe distinguished.Detectability: A marker is detectable in an image

if it generates a local maximum in scale space,i. e., the detector will report a feature point atthe marker position. This is the minimum re-quirement towards our markers.

Unique detectability: If the imaged marker eventriggers the global maximum in scale space,or if the combination of high detector re-

Figure 6: Examples of the distortions we applied tothe SURF marker in our synthetic experiments. Inplane rotation by 10◦, out-of-plane rotation by 40◦,Gaussian noise with standard deviation equal to 50gray values. The green cross indicates the positionat which the marker has been detected

sponse and signature similarity identifies it asa marker, it is uniquely detectable. The exper-iments we describe in Section 4.2 suggest thatour markers also have this property.

The goal of the following experiments is tobackup the theoretical optimality of our marker de-sign. More specifically, we want to show thateven under unfavorable viewing conditions, i. e.,other than those assumed during the derivations, themarkers still yield over-average detector response.What we would ideally expect is a detector responseinvariant to all these effects.

We used the following setup: A syntheticallygenerated marker, either SIFT or SURF, is placedcentered in front of a uniform background in an im-age of size 513×513 pixels. The initial size of theSURF marker is s=147 pixels, and σ=15 pixels inthe SIFT case. After applying different distortionsto the image, we run the respective feature detec-tor and measure the detector response. At the sametime, we follow the localization error with respectto ground truth, and if the marker is detected morethan 3 pixels off, we declare it undetected. This is arather strict rule compared to the used marker size.The distortions we have investigated are(a) scaling,(b) in-plane rotation,(c) out-of plane rotation and(d) image noise.

Figure 6 illustrates the applied distortions in thecase of the SURF marker. Note that for the depictedout-of-plane rotation, the detection of the markerdid not pass our 3 pixel accuracy test. Yet, the de-tection itself was apparently correct.

Figures 7 to 9 show the behavior of the responsevalues for selected examples. Our overall observa-

Page 6: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

0 50 100 150 200 2500

10

20

30

marker size σ in image (pixel)

SIFT

resp

onse

0 50 100 150 200 2500

5

10 x 104

marker size s in image (pixel)

SUR

F re

spon

se

Figure 7: Behavior of detector response under scal-ing of the marker.

tion is that the detector response in general stays ona fairly high level, i. e., close to the maximum valuereachable in the absence of distortion. Within theplotted ranges the 3 pixel threshold was never vio-lated (except for major angles in the out-of-planerotation scenario). However, there are some unex-pected effects which require further investigations.

A first interesting observation is the behavior ofthe response over the actual marker scale (Fig. 7).Both SURF and SIFT exhibit isolated ditches atmid-scales. Apparently, the spacing between neigh-boring scale values is adversely coarse in these re-gions. Apart from that, the two curves show the ex-pected behavior. At too small sizes, inferior to thesmallest detector operator, the marker does not getdetected at all. In the case of SURF, there is also anupper limit to the detectable marker size as, differ-ent from SIFT, there is a predefined number of scalesteps. This experiment suggests a minimum size forthe SURF marker of 12 pixels, and a maximum sizeof 210 pixels. In practice, typical sizes are likelynot to exceed 50 pixels as markers are preferred tobe as nonintrusive as possible.

Regarding the behavior with respect to in-planerotation, the results are convincing (Fig. 8). Eventhough there are minor variations in the detec-tor output, the overall response level remains con-stantly high. This was to be expected for the rota-tionally symmetric SIFT marker, because the usedresolution is sufficiently high to avoid severe alias-ing artifacts. It is remarkable in case of the SURF

0 10 20 30 40 50 60 70 80 900

10

20

30

angle (degrees)

SIFT

resp

onse

0 10 20 30 40 50 60 70 80 900

5

10 x 104

angle (degrees)

SUR

F re

spon

se

angle (degrees)

Figure 8: Behavior of detector response under in-plane rotation.

marker however, for which reduced detectability forrotation angles around 45◦ would seem inevitable.

For out-of-plane rotations (Fig. 9), the markershit the limits that are given by the feature detec-tors themselves. It is known that both detectorsdo not cope well with angles beyond some 40 de-grees. While the SIFT marker does well in termsof high detector output, it exceeds the 3 px local-ization accuracy test for angles greater than 35 de-grees. The SURF marker exhibits a significant re-sponse decrease and also deviates from ground truthmore than 3 pixels for angles superior to 25 degrees.

The behavior with respect to artificially addedimage noise is not shown here. It turns out thatboth marker variants are highly robust against noise.The performance degradation follows a linear de-scent and is only relevant for unnaturally high noiselevels with a standard deviation of at least 20% ofthe range of gray values.

4.2 Example Applications

Our markers are particularly useful wheneverhomogeneous, unstructured surfaces need to be”spiced up” with detectable features. Here, wetouch on two example applications .

Robotic Navigation Imagine an experimentalsetup where a solely vision-based robot is to nav-igate through the lab in order to fulfill a certaintask. Without a map or detectable markers in the

Page 7: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

0 10 20 30 40 50 600

10

20

30

angle (degrees)

SIFT

resp

onse

0 10 20 30 40 50 600

2

4

6

8 x 104

angle (degrees)

SUR

F re

spon

se

Figure 9: Behavior of detector response under per-spective distortion. The dotted red lines indicatethe limit above which the localization error ex-ceeds 3 px.

scene, the robot would certainly lose track. Assumethe task involves the recognition of certain objectswhich is based on the use of SURF features already.Such a prototypical scenario is the ideal applicationfor our markers: We lay out the path to be followedby simply lining it with markers the robot can detectwith its already built in functionalities.

Given the sensitivity to out-of-plane rotations ac-cording to our experiments from Section 4.1, wepropose a simple preprocessing step for this partic-ular scenario: As the height of the robot mountedcamera remains constant, we can assume that themarkers are always perceived perspectively dis-torted in the same way, i. e., they appear wider thanhigh. We found that stretching the image verticallyby a factor 2 (using bilinear interpolation) improvesmarker detection significantly.

As illustrated in Figure 10, we placed 14 8×8 cmmarkers on the floor and ran SURF with very lowresponse threshold on the depicted image. In to-tal, some 30000 features were detected. Neverthe-less, the markers derived in this paper were reli-ably found as those features with the highest detec-tor response values. Two of the markers were de-tected twice, each time with different orientation, sothere are 16 highest response features for 14 presentmarkers.

Figure 11 shows the response values in descend-ing order. The labels in Figure 10 relate to this rank-ing. A drop from feature 16 to feature 17 is clearly

1

3

4

5

6

7

8 9

10

12

13

14

15

16

1

2

3

4

5

6

7

8 9

10

1112

1314

15

16

Figure 10: Image captured with a Panasonic DMC-FX1 at resolution 2048×1536 pixels, after verticalupsampling by factor 2. The detected SURF mark-ers are labelled according to the ranking in Fig-ure 11.

visible., i. e., the markers are indeed uniquely de-tectable in the image. Figure 12 shows the simi-larity between the descriptor signature of detectedpoints and the signature of the dark marker tem-plate. Again, the markers clearly stand out. Notethat there are other features, e. g., the 56th, with sim-ilar signatures, but their rank, and equivalently thedetector response they triggered are much lower.

A property of this example setup that is worth-while mentioning is that since every detectedmarker comes with a scale assigned by the SURFalgorithm, the robot can already make first assump-tions about the marker distances without the needfor stereo vision and triangulation.

SIFT/SURF CAVE Another example applicationthat takes a similar line is to have a wallpaper orposter textured with the markers from Section 3. By

Page 8: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

0 10 20 30 40 50 600

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 104 Ordered detector response values

feature rank

SUR

F de

tect

or re

spon

se

markersother features

Figure 11: The SURF detector responses from theexample in Figure 10 in descending order. The first16 values belong to the imaged markers.

0 10 20 30 40 50 600

0.2

0.4

0.6

0.8

1

1.2

1.4

feature rank

Euclidean distance to reference signature

Figure 12: Signature similarity for the SURF fea-tures from the example in Figure 10. For every fea-ture, the Euclidean distance between its signatureand the signature of the template marker is plotted.

simply putting up such posters, a computer visionlab can easily be turned into a ”SIFT/SURF CAVE”with dense feature points all over the walls.

Mixing light and dark versions of the marker,such posters can carry one bit per marker allowing,e. g., to encode information about different parts ofthe room. In principle, there are two ways to dis-tinguish both marker variants from each other, ei-ther based on marker signature or by means of theLaplacian of the respective marker, i. e., the sum ofsecond derivatives. For SIFT, it is precisely the de-tector output (1) that approximates the Laplacian.In the case of SURF, it is given by the trace of theHessian matrix introduced in (2). The Laplacian isstrictly negative for dark markers and strictly posi-tive for light ones, hence a simple thresholding is

sufficient for separation. We found that this ap-proach is more reliable than signature comparison,and we will use it in the following.

We illustrate the combination of markers usingarrays composed of three SURF markers each. Fig-ure 13 shows two such super-markers printed out onA4 paper and attached in our lab. The upper left cor-ner of the two-by-two arrays is left blank to have anobvious array orientation. This design allows eightdifferent marker arrays which could be used to tagstrategic points in the room.

We propose the following simple algorithm toidentify and decode these marker arrays. First, theSURF markers are uniquely detected in the imageby means of their outstanding detector response andsignature similarity. The feasibility of this step willbe demonstrated shortly. Second, those markers areidentified whose closest two neighbors satisfy cer-tain geometric constraints, namely (a) their respec-tive distances are similar and (b) they lie in roughlyperpendicular directions. This yields 3-marker clus-ters in linear time with regard to the number ofmarkers in place. Assuming that all the markerarrays are imaged close to upright, the topmostmarker of each triple is then defined to carry the firstbit, the leftmost one the second and the remainingmarker the third bit. Finally, the bit value of eachmarker is determined using the sign of its Laplacian(see Figure 15). This approach is supposed to illus-trate the combination of markers and can clearly beextended in a more sophisticated way.

Figure 14 shows the results of the SURF detec-tion step. Similarly to the robot navigation exam-ple, we examine the found features in terms of de-tector response and distance to the reference signa-ture from Figure 4. Every point in the figure cor-responds to a detected feature, and obviously, thesix features that belong to the markers form a clus-ter in an area with high detector response and lowsignature distance. As opposed to the previous ex-periment where only dark markers were deployed,we aim at simultaneously identifying both versionsof the SURF marker in this example. Therefore, wedecided not to work with the full marker signatures,but only those entries which are common to bothmarker variants. In the OpenCV implementationof SURF the corresponding descriptor dimensionsare 23, 24, 27, 28, 39, 40, 43 and 44. We call thisreduced representation version independent signa-ture (VIS).

Page 9: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

4 5

6 1

2 3

Figure 13: Two out of eight possible 3-bit markerarrays photographed with a Sony DSC-S85 at res-olution 2272×1704 pixels. The location of the sixSURF features which yielded highest detector re-sponse are displayed as green crosses.

The distribution of response-distance pairs inFigure 14 suggests the use of a more complex clas-sification algorithm than the current one which isprimarily based on detector response. Instead, anoblique separation line seems more suited to theproblem. In future work, it should be investigatedto what extent other linear or nonlinear classifica-tion approaches can improve marker identification.

5 Conclusion

We have presented our work on maximum detec-tor response markers for SIFT and SURF. We de-rived their optimal design theoretically, and pre-sented first experimental results. The markers wepropose are inexpensive and easy to use. It is suf-ficient to print them out and arbitrarily place themin the scene. No additional detection algorithms ontop of SIFT or SURF are required. The stated al-gorithms detect our markers under a wide range ofviewpoints, such that they can be uniquely identi-fied due to their high response and their discrimina-tive descriptor signature.

We plan to conduct more exhaustive experimentson real data in order to assess the markers’ perfor-mance quantitatively. Especially a sound evaluationof the localization error in natural applications willbe addressed in future work.

If you are interested in testing the markers we

0 0.5 1 1.5 2 2.5 3 3.5x 104

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

SURF detector response

Eucl

idea

n si

gnat

ure

dist

ance

markersother features

Figure 14: Detector response and VIS distance val-ues for all the SURF features extracted from the im-age in Figure 13. The markers form an apparentcluster.

1 2 3 4 5 6−1

0

+1Sign of the Laplacian

feature rank

Figure 15: The sign of the Laplacian allows to reli-ably distinguish the light (+1) and dark (-1) versionsof the SURF marker (compare with Figure 13).

proposed in this paper yourself, we invite you tovisit our homepage1 and download the markers inPDF format. Make sure you print them in highquality as this is essential for optimal results! Themarkers we used in the real world experiments fromSection 4.2 were printed on a HP Color Laser-Jet CP4005 at 1200 dpi using bright-white paper.Please feel free to share your experiences with us!

1http://www.lmt.ei.tum.de/florian/markers

Page 10: Maximum Detector Response Markers for SIFT and · PDF fileMaximum Detector Response Markers for SIFT and SURF Florian Schweiger, Bernhard Zeisl, Pierre Georgel, Georg Schroth, Eckehard

References

[1] H. Bay, T. Tuytelaars, and L. Van Gool. Surf:Speeded up robust features. Lecture Notes inComputer Science, 3951:404, 2006.

[2] G. Bradski and A. Kaehler. LearningOpenCV: Computer Vision with the OpenCVLibrary. O’Reilly Media, Inc., 2008.

[3] M. Fiala. ARTag, a fiducial marker systemusing digital techniques. In IEEE ComputerSociety Conference on Computer Vision andPattern Recognition, 2005. CVPR 2005, vol-ume 2, 2005.

[4] H. Kato and M. Billinghurst. Marker track-ing and HMD calibration for a video-basedaugmented reality conferencing system. In2nd IEEE and ACM International Workshopon Augmented Reality, 1999.(IWAR’99) Pro-ceedings, pages 85–94, 1999.

[5] H. Kato, M. Billinghurst, I. Poupyrev,K. Imamoto, and K. Tachibana. Virtual ob-ject manipulation on a table-top AR environ-ment. In IEEE and ACM International Sym-posium on Augmented Reality, 2000.(ISAR2000). Proceedings, pages 111–119, 2000.

[6] T. Lindeberg. Scale-space theory: A basic toolfor analyzing structures at different scales.Journal of applied statistics, 21(1):225–270,1994.

[7] D.G. Lowe. Distinctive image features fromscale-invariant keypoints. International Jour-nal of Computer Vision, 60(2):91–110, 2004.

[8] K. Mikolajczyk. Detection of local featuresinvariant to affine transformations. PhD the-sis, Institut National Polytechnique de Greno-ble, France, 2002.

[9] G. Turin. An introduction to matched fil-ters. Information Theory, IRE Transactionson, 6(3):311–329, 1960.

[10] A. Vedaldi. An open implementation of theSIFT detector and descriptor. Technical re-port, 070012, UCLA CSD, 2007.2.

[11] D. Wagner, T. Langlotz, and D. Schmalstieg.Robust and Unobtrusive Marker Tracking onMobile Phones. In 7th IEEE/ACM Interna-tional Symposium on Mixed and AugmentedReality, 2008. ISMAR 2008, pages 121–124,2008.

[12] D. Wagner and D. Schmalstieg. Artoolkit-plus for pose tracking on mobile devices. In

Proceedings of 12th Computer Vision WinterWorkshop (CVWW’07), pages 139–146, 2007.

[13] X. Zhang, S. Fronz, and N. Navab. Visualmarker detection and decoding in ar systems:A comparative study. In Proceedings of the1st International Symposium on Mixed andAugmented Reality. IEEE Computer SocietyWashington, DC, USA, 2002.