3d real-time positioning for autonomous navigation using a nine-point landmark

18
3D real-time positioning for autonomous navigation using a nine-point landmark Alberto Martı ´n, Antonio Ada ´n n Escuela Superior de Informa ´tica, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain article info Article history: Received 29 July 2010 Received in revised form 4 April 2011 Accepted 11 July 2011 Available online 23 July 2011 Keywords: Camera pose Autonomous navigation Camera calibration Landmarks Occlusion Augmented reality abstract The objective of this paper is to propose a new monocular-vision strategy for real-time positioning applications. This is an important aspect whose solution is still necessary in many autonomous landmark-based navigation systems that run in non-controlled environments. The method is based on the analysis of the properties of the projected image of a single pattern consisting of eight small dots belonging to the vertices of an octagon and one more dot in the centre of it. The paper discusses how the pose is calculated by using the parameters of the ellipse that best fits the dots of the pattern and the relative position of the dots in it. The first part of this document provides a qualitative comparison with regard to other similar approaches. The method presented here has several notable properties. Firstly, the pattern can be easily recognized in the image and, more importantly, works under occlusion and noise circumstances. Secondly, it is capable of working in real-time conditions and deals with ranges of 30–700 cm. Finally, the method can be used on different applications (mobile robots and augmented reality systems) and yields better accuracy than others. An extensive report on experiments conducted in real situations is shown at the end of the paper. In order to make our method more compelling, an experimental comparison under occlusion and noise conditions is also made with one of the most widely used pose solutions, the ARToolkit method. & 2011 Elsevier Ltd. All rights reserved. 1. Introduction One of the key points in autonomous navigation systems consists of obtaining an accurate camera pose as quickly as possible. In other words, the observer’s position and orientation with regard to the world coordinate system must be calculated in real-time. Despite the existence of positioning and tracking systems in controlled environments (for example, technologies based on iner- tial devices or on networks with a signal receiver/emitter), new solutions must be found for positioning in autonomous systems. This issue consequently continues to be an open research field in which innovative solutions are suggested every year. In the context of non-controlled environments, it is possible to find a multitude of published papers which concern the well known pose estimation problem, along with a considerable amount of theoretical solutions that were proposed many years ago, including those based on the three-point pose technique [22]. However, very few proposals deal with the problem in an extensive manner by considering a complete set of aspects such as: performance under occlusion, robustness under noise, distance range, accuracy, real-time performance, landmark recognition processes, applications and so on. Fewer still provide either experimental information about all these parameters or experimental comparisons with other methods. The aim of this paper therefore is to present a new pose approach, to provide complete information about the method’s performance, to clearly show the results under occlusion and noise and to compare the method with others. From a theoretical point of view, analytical solutions of the PnP (Perspective-n-Point), and particularly the P3P problem (also denominated as the triangle pose problem), have been studied by numerous authors for many years [22,24,26]. It is well known that there are at most four algebraic solutions [24] for this case. Exten- sions with four and five points, which eliminate P3P ambiguity, have also been used by other authors [25]. Suffice it to say that critical configurations exist in which the pose remains ambiguous, such as when the projection centre is coplanar with any three points. Pose solutions from redundant data have also been proposed [27,28, 29]. These approaches rely on iterative methods which suffer from initialization and convergence problems. These techniques, which are based on P3P solutions, do not therefore provide a single solution and, more importantly, are inefficient when one or more points are occluded in the image. The lack of methods that directly provide a single pose solution for the redundant data case has motivated us to develop a new solution, which is presented in this paper. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr Pattern Recognition 0031-3203/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2011.07.015 n Corresponding author. E-mail addresses: [email protected] (A. Martı ´n), [email protected] (A. Ada ´ n). Pattern Recognition 45 (2012) 578–595

Upload: alberto-martin

Post on 11-Sep-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 3D real-time positioning for autonomous navigation using a nine-point landmark

Pattern Recognition 45 (2012) 578–595

Contents lists available at ScienceDirect

Pattern Recognition

0031-32

doi:10.1

n Corr

E-m

Antonio

journal homepage: www.elsevier.com/locate/pr

3D real-time positioning for autonomous navigation usinga nine-point landmark

Alberto Martın, Antonio Adan n

Escuela Superior de Informatica, Universidad de Castilla-La Mancha, 13071 Ciudad Real, Spain

a r t i c l e i n f o

Article history:

Received 29 July 2010

Received in revised form

4 April 2011

Accepted 11 July 2011Available online 23 July 2011

Keywords:

Camera pose

Autonomous navigation

Camera calibration

Landmarks

Occlusion

Augmented reality

03/$ - see front matter & 2011 Elsevier Ltd. A

016/j.patcog.2011.07.015

esponding author.

ail addresses: [email protected] (A. Ma

[email protected] (A. Adan).

a b s t r a c t

The objective of this paper is to propose a new monocular-vision strategy for real-time positioning

applications. This is an important aspect whose solution is still necessary in many autonomous

landmark-based navigation systems that run in non-controlled environments. The method is based on

the analysis of the properties of the projected image of a single pattern consisting of eight small dots

belonging to the vertices of an octagon and one more dot in the centre of it. The paper discusses how

the pose is calculated by using the parameters of the ellipse that best fits the dots of the pattern and the

relative position of the dots in it. The first part of this document provides a qualitative comparison with

regard to other similar approaches. The method presented here has several notable properties. Firstly,

the pattern can be easily recognized in the image and, more importantly, works under occlusion and

noise circumstances. Secondly, it is capable of working in real-time conditions and deals with ranges of

30–700 cm. Finally, the method can be used on different applications (mobile robots and augmented

reality systems) and yields better accuracy than others. An extensive report on experiments conducted

in real situations is shown at the end of the paper. In order to make our method more compelling, an

experimental comparison under occlusion and noise conditions is also made with one of the most

widely used pose solutions, the ARToolkit method.

& 2011 Elsevier Ltd. All rights reserved.

1. Introduction

One of the key points in autonomous navigation systems consistsof obtaining an accurate camera pose as quickly as possible. In otherwords, the observer’s position and orientation with regard to theworld coordinate system must be calculated in real-time.

Despite the existence of positioning and tracking systems incontrolled environments (for example, technologies based on iner-tial devices or on networks with a signal receiver/emitter), newsolutions must be found for positioning in autonomous systems.This issue consequently continues to be an open research field inwhich innovative solutions are suggested every year.

In the context of non-controlled environments, it is possible tofind a multitude of published papers which concern the well knownpose estimation problem, along with a considerable amount oftheoretical solutions that were proposed many years ago, includingthose based on the three-point pose technique [22]. However, veryfew proposals deal with the problem in an extensive mannerby considering a complete set of aspects such as: performanceunder occlusion, robustness under noise, distance range, accuracy,

ll rights reserved.

rtın),

real-time performance, landmark recognition processes, applicationsand so on. Fewer still provide either experimental information aboutall these parameters or experimental comparisons with othermethods. The aim of this paper therefore is to present a new poseapproach, to provide complete information about the method’sperformance, to clearly show the results under occlusion and noiseand to compare the method with others.

From a theoretical point of view, analytical solutions of thePnP (Perspective-n-Point), and particularly the P3P problem (alsodenominated as the triangle pose problem), have been studied bynumerous authors for many years [22,24,26]. It is well known thatthere are at most four algebraic solutions [24] for this case. Exten-sions with four and five points, which eliminate P3P ambiguity, havealso been used by other authors [25]. Suffice it to say that criticalconfigurations exist in which the pose remains ambiguous, such aswhen the projection centre is coplanar with any three points. Posesolutions from redundant data have also been proposed [27,28,29].These approaches rely on iterative methods which suffer frominitialization and convergence problems. These techniques, whichare based on P3P solutions, do not therefore provide a single solutionand, more importantly, are inefficient when one or more points areoccluded in the image. The lack of methods that directly provide asingle pose solution for the redundant data case has motivated us todevelop a new solution, which is presented in this paper.

Page 2: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 579

Pose techniques based on line segments, polygons and curvesrather than points have also been developed, mainly in computervision. In this section we compare some of the principal techni-ques published over the past few years that provide a single posesolution based on different strategies to those of PnP, consideringthe parameters presented above and identifying their advantagesand limitations. Table 1, provides a summary of the essentialcharacteristics of over 20 systems and methods that have appearedin journals and conferences. In order to compare our system withothers, our proposal (9-Dots) is shown in the final row of this table.‘NR’ (not reported) appears in those cases in which the authors didnot include any explicit information. In other cases it was necessaryto simplify the results by computing average values. We alsodeduced or estimated certain characteristics on the basis of theinformation included in each paper. The properties included inTable 1 are concerned with the following aspects.

Column C1 (‘Landmark’) concerns the pattern proposed byeach author. Note that several pose methods use natural land-marks [8–11,6,1] and that others define specific artificial patterns.Column C2 (‘Image processing’) evaluates whether there is aspecific image processing technique with which to recognize thelandmark in the image. For example, if the process consists ofusual algorithms, such as those used to recognize rectangles[13,14], circles (9-Dots) or barcodes in the image [12], we haveincluded the word ‘No’, signifying ‘No specific image processing’.However, the majority of authors do not provide informationabout this issue. The next column (‘Features’) shows the featuresthat are defined in the image in order to later calculate the pose.The ‘Dynamic’ column is used to indicate whether or not themethod can work in moving environments.

One of the most discriminant aspects of this qualitative compar-ison concerns column C5 (‘Occlusion’). The majority of authors donot make any reference to the performance of their methods whenthe landmark is occluded [1–8], and none of them provides quanti-tative occlusion information. The authors that consider occlusioncircumstances assume that the landmark is partially occluded but ina non-critical sense. These cases therefore correspond to non-severeocclusion. Natural features [9–11], artificial landmarks [12–14] orboth [15] are used. Some authors simply mention that the systemworks under occlusion but do not prove this properly [14], whereasothers [10,11,16] deal with the problem in depth. In [17] the authorspresent a self-tracker that fuses data from inertial and vision sensors.In [12], the patterns designed consist of a common vertical barcodeand a single horizontal barcode that distinguish them from others. Inthe tests performed, all the landmarks are correctly detected in spiteof partial occlusions. In [13], several patterns appear in the scene.Thus, if one of them is partially occluded, the pose estimation canbe easily handled by using any of the non-occluded patterns in theimage. In [18], a robust pose method analyses the geometric dis-tortions of the objects when subjected to changes in position. As willbe demonstrated in Section 5, the ‘9-Dots’ method works under 40%occlusion without considering the pose of earlier frames.

Column ‘C6’ shows which methods have been tested undernoise conditions. Almost none of the methods have been testedunder noise, with the exception of [1,5,9,16].

With regard to the adaptability of the pattern to be used inwide distance ranges, column C7 (‘Distance’) shows that most ofthe methods referenced are designed for indoor environments,and work with short and constant ranges. However, very littleinformation is provided in these terms. Exceptions can be foundin [4], in which the camera itself establishes a variable range from0.85 to 1.7 or 3.3 m, and in [7], in which a tracking procedure worksfor distances from 50 cm to 5 m, depending on the multi-ring fiducialdiameter. The 9-Dots technique can work from 30 to 700 cm.

The next column, ‘FPS’, provides information about how fastthe method is capable of working, since fast image processing

may be the key factor when working under real-time require-ments. Several authors provide detailed information about per-formance time [4,5,7,10–12,14,15,18], but only a few of themstate what the image size is. In [15] the authors use an artificiallandmark placed in front of the camera at start-up and naturalfeatures to track and map the environment. The authors of [19]present a method for image patch recognition for pose estimationbased on ferns. Most of the systems work in real-time conditions:specific rates are 36–81 fps (frames per second) [12], 30 fps[5,15,17], 10 fps [14], 7.3–8.1 fps [7], 15–25 fps [10,11], 15 fps[20]. The size of the image, the kind of camera, and the imageprocessing may influence the final rate of the pose system. Forexample, in [4], the system works between 4.2 and 7.2 fps,depending on the kind of camera chosen. In [7], the performanceof the system depends on the number and size of potentialfiducials in the image. The 9-Dots method can operate to 35 fpswith 640�480 frames on a desktop PC.

A further subject of interest corresponds with the ‘Accuracy’column. Many authors do not provide any information concerningaccuracy [8,14,12,9,15,13]. Others provide relative measurements,giving error percentages [10,11,1], absolute position errors[2–5,17] or both (9-Dots). In this case, only relative accuracycan be taken if a reliable comparison is to be made. The 9-Dotsmethod has an average error of 2.6%.

Finally, the last column, C9, is concerned with the environmentin which the pose technique has been tested. In this respect, wecan find methods which have not been tested in any specificenvironments [6,1,2,15,18,19], and others which were used formobile robot/vehicle positioning [5,4,12,3,9] or for augmentedreality applications [7,10,11,14,13,21]. The 9-Dots approach hasbeen tested for both mobile robot positioning and augmentedreality applications.

The document presented here can be considered as a reviewed,extended and improved version of an earlier publication [21]. Thiswas an initial and incomplete pose solution developed underconsiderable restrictions. Some of these limitations were as follows:

1.

The initial solution only worked in ideal conditions in whichocclusion and noise were not permitted. This is a highly importantissue which has now been dealt with in the reviewed version.

2.

The method did not assume that the optical axis deviated fromthe pattern’s centre. In this case, as we explain in the text, theellipse changes a little with regard to the centred ellipse forreasons of perspective.

3.

There were neither theoretical nor experimental comparisonswith other methods.

4.

Only a simple report for mobile robot positioning applicationsusing a pan/tilt controlled camera was presented in the paper.This device allowed the pattern to be centred within the imageso that misalignment of the optical axis was avoided.

5.

The method worked solely under static requirements for onlyone frame of the scene. In the experimentation presented, therobot stopped before taking an image of the pattern.

All these points have been taken into consideration and solvedin the version presented herein. The principal aspects to beconsidered in this publication are as follows. Firstly, this paperprovides a complete and more extensive explanation of themethod itself. This is an essential point if other researchers wishto understand and reproduce the method. Section 1, whichpresents a specific comparison between our approach and themajority of the most similar methods, is new in this document.New references and tables of comparison have been incorporated.With regard to the pose solution itself, this version reformulatesthe initial version, thus making the method most robust andapplicable. Several changes have been made in order to establish

Page 3: 3D real-time positioning for autonomous navigation using a nine-point landmark

Table 1Comparative study of pose estimation systems.

Author (year) C1 C2 C3 C4 C5 C6 C7 C8 C9 C10Landmark Image processing Features Dynamic Occlusion Noise Distance FPS Accuracy Applications

Jang (2005) [5] Bi-colour pattern(rectangular) and ablack line over it

NR Relative position ofthe line with regardto the colourpattern

Yes NR Yes, gaussian noiseadded produces 0.31standard deviation

NR Yes, 30 fps, 10 cm/sin real time

Mean error 7 cmwith added noise

Autonomousrobot

Neumann (1999)[7]

Multi-ring colourpattern

NR Indoor: concentricrings, outdoor:natural features

Yes, a fusion ofinertial gyro andvision

NR NR 1st level: 1.5–3.7 ft,2nd level: 3.0–7.4ft, 3rd level: 5.9–14.8 ft

1st level: 6.3 fps,2nd level: 7.3 fps,3rd level: 8.1 fps

Better than Hornand Schunk, Lucas–Kanade, Anandan,Fleet and Jepson

HMD

Xu (2008) [8] Natural No, correspondencealgorithms

Natural features Yes, uses pre-captured frames

NR NR Outdoor distances No NR Arbitrary scene

Vachetti (2004)[10] [11]

Natural No NR Yes Yes, the systemdoes not work forsevere occlusion

NR NR 25 fps (320�240),15 fps (640�480)using pre-processedkey frames

3% pose error Single camera,augmentedreality

Koller (1997) [14] Eight rectangularpatterns attached toa wall

No Search for thecorners ofrectangularpatterns

Robustness andaccuracy within anAR indoorapplication

NR NR NR Yes, 10 Hz NR Augmentedrealityapplications

Fiala (2004) [4] Eighteen linearencoded patterns,similar to a barcode

Yes, 3601 field ofview

Vertical edges andabsence of horiz.edges, sequences of‘0’ and ’1’

Robust even withlow resolution

NR NR Panoramic cameras:0.85–3.3 m, analogcameras: 0.85–1.7 m

No, quasi-real time,4.2–7.2 fps,depending oncamera

Average error from9 to 15.1 cm

Robot

Briggs (2000) [12] Self-similarlandmark patternwith barcode

No, intensitydistribution andbarcode decoding

Patterns arep-similar in onedirection andconstant in theother

Yes. Reliablevarying lightingcondition anddistances

Yes NR NR Yes. 36fps(640�480) 81fps(320�240)

NR Mobile robots

Feng (2008) [3] 3 landmarks on ahorizontal line

Fisheye lens. Specialimage processingtechniques

The view angle tothe landmarks

Yes NR NR NR Yes, the responsetime delay is low

From 3 to 25 cm,with critical area

Autonomousguided vehicles

Josephson (2007)[6]

Natural NR Hybrid features No, bad accuracy NR NR NR No, static technique 64% 2D–2D solverand 43% 2D–3Dsolver accuracy

NR

Se (2002) [9] Natural Yes. SIFT algorithm Image features Yes NR Yes, accuracyaffected

NR No, high processingrequired

NR robot

Cobzas (2009) [1] Natural No Intensitydifferences

Yes No No, it does nothandle illuminationvariation

NR Yes, 3D positionbased on pre-computed planeequations

9% pose error NR

Duan (2008) [2] Orthogonal No Trapezium Yes No NR NR Yes 7 cm, No occlusion NRHager (1998) [18] Parametric models Yes, low resolution

algorithmsParametric models Yes for reasonable

deviationsYes, partial NR NR Yes (30 Hz) NR NR

Kato (2000) [13] Square landmark No High contrastdifference

Yes Yes, multiplelandmarks required

NR NR Yes NR HMD

Davison (2007)[15]

Known start-uptarget

NR SIFT Yes Yes NR 0.5–5.0 m Yes. 30 fps 1–2 cm jitter levels Unknown scenes

Lepetit et al.(2005) [16]

Keypoint NR NR Yes Yes, soft occlusion Yes, it disturbs theclassification task

NR No, 25 fps Faster than SIFTwhen too muchperspective distortsthe object view

NR

Ozuysal et al.(2007) [19]

Keypoint NR NR Yes Yes NR NR Yes, 50 fps Better than SIFT NR

Wagner et al.(2008) [20]

Knows start-uptarget

NR NR Yes Yes NR NR Close to real time It depends on themobile phone

AR in future

Foxlin andNaimark (2003)[17]

Artificial NR NR Yes Yes, line-of-sightocclusion

NR NR Yes, 60 fps 100 fiducial at atime

HMD

9-Dots Nine dots No Centre of dots’coordinates

Yes Yes, for severeocclusion, accuracydecreases

Yes, 1% relativeerror

30–700 cm Yes, 30 fps640�480

2.6% pose error HMD and mobilerobots

A.

Ma

rtın,

A.

Ad

an

/P

attern

Reco

gn

ition

45

(20

12

)5

78

–5

95

58

0

Page 4: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 581

a clearer solution. In this respect, we deal with the occlusion andpattern misalignment problems, proposing new solutions. Thecalculation of the optical axis deviation is also tackled. Noneof these subjects have ever been considered previously. Withregard to experimentation, we devote a section to presenting theperformance of the method in occlusion circumstances and foraugmented reality applications. Moreover, an extensive andsound experimental comparison with the widely used ARToolkittechnique is presented. Three cases are considered in this com-parison: performance with and without occlusion and perfor-mance under noise conditions. We prove that our system is moreaccurate, robust and efficient in all three cases.

2. Pose parameters and camera model

Before presenting the pose calculation, we shall first outlinethe framework, the general pose strategy and the parameters tobe calculated.

Let us assume a navigation entity – which in our case could be amobile robot or a human wearing an AR system – with an integratedcamera which aims to obtain the 3D pose through the closestlandmark located in the scene. We have illustrated this situation inFig. 1 which contains our 9-Dots pattern. The reference systems tobe considered are the following: world reference system (Sw),

Fig. 1. The pattern and the reference systems: parameters c, f, y and D0 in the

pattern reference system. Here, the navigation entity corresponds with a user

wearing a portable AR system.

Fig. 2. Ellipse fitted to the external s

navigation entity reference system (Sh), camera reference system(Sc), image reference system (Se), computer reference system (Ss)(which is the digital image reference system) and landmark refer-ence system (S0). Note that the relationship Sw/S0 is imposed whenthe landmark is positioned in a specific place and that Sh/Sc isestablished by the users themselves. Moreover, the relationship Sc/Se

and Se/Ss is established by the intrinsic calibration of the camera. Thepose problem is consequently reduced to find the transformationS0/Sc, which varies as the navigation entity moves.

The autonomous procedure presented in this paper is based onthe fact that, for any view of the nine-dot pattern, the outer dotsof the pattern belong to an ellipse E which alters as the cameramoves (see Fig. 2). The Hough transform algorithm is used toobtain all possible ellipses in the image and the landmark is thenidentified by means of the size and relative position relationshipsbetween the dots in the pattern. The geometric analysis of E andthe location of the dots in it allow the angular parameters swing

(c), tilt (f), pan (y) to be extracted, along with the distance D’between the origin of S0 and the image plane of the camera. Fromhere on, we shall denote the dots (or the centre of the dots) as Pi,i¼1,2y9. See Fig. 1 to consult the pattern reference system andthe parameters.

Changes in the user’s position cause changes in ellipse E.Variations in c thus cause the rotation of the major axis of theellipse in the image; changes in parameter f imply changes in theellipse eccentricity when ya0, dots {P1, P3, P5, P7} are locatedoutside the axes of the ellipse and, finally, a variation in D0 makesthe length of the major axis of the ellipse change, following aquasi-lineal relationship.

Let (Oo, Xo, Yo, Zo) be the pattern reference system (previouslycalled S0) and let (Oe, Xe, Ye, Ze) be the image reference system.Origin Oe is located in the centre of the image plane – where theCCD matrix is placed – whereas Ye is aligned with the optical axis.If we assume that the optical axis passes through Oo, then we callOc the optical centre, D the distance OcOo, f the focal of the camera(which is the distance OcOe) and D0 the distance OeO0.

The centres of the nine spots are labelled as {P1, P2, P3, P4, P5,P6, P7, P8, P9}. Points P1 to P8 are the centres of black or colouredspots (depending on the application) of the same size, whereas P9

is the centre of a double-sized black spot. Note that in the pattern,P1 is on the Z0 axis and P9 and O0 are coincident.

The computer reference system (Os, Xs, Zs) gives the coordi-nates in pixels of the points in the digital image. We assume thatthe pattern is centred on the image and that the optical axis istherefore aligned with the line O0Oe. As will be explained inSection 6, this requirement is satisfied in our systems. In theapplication with mobile robots, a visual-servoing is continuously

pots from different viewpoints.

Page 5: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 3. Set of transformations between reference systems S0 and Se.

Fig. 4. Relationship between the camera-pattern distance and major axis of the

ellipse E.

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595582

executed until the pattern is centred on the image. In the ARapplication it is possible for users themselves to centre the image.

The camera model presented in Fig. 1 is used to establish thetransformation between the coordinates of a generic point P forreference systems S0 and Se. Formally:

Pe ¼MPo ð1Þ

M¼ RY 000ðcÞ � TðD0Þ � RY 00

0ðp=2Þ � RX0

0ðfÞ � RY0

ð�yÞ ð2Þ

where the Euler rotations are performed over axes Y0,X00 and Y 000and T corresponds to a translation in axisY 000. Fig. 3 illustrates theset of transformations of Eq. (2). The following section explainshow the parameters c, f, y and D0 are obtained from a geometricanalysis of ellipse E in the image.

3. Pose solution

3.1. Camera-landmark distance

In this section a set of geometrical relationships is establishedin which the camera-pattern distance D, the focal of the camera f

and the major axis a appear. Fig. 4 illustrates how the major axisof the ellipse projected in the image plane corresponds to theprojection of the segment AB. This segment is obtained afterintersecting the hypothetic circle C, which passes through all theeight external dots of the pattern, with the plane P, which isparallel to the image-plane and passes through O0. Note thattriangles (O0AOc) and (OeA

0Oc) are then equivalent, and Eq. (3) isverified. According to this equation the major axis of the projectedellipse depends solely on the distance from the camera to P9, andthe camera-pattern distance can consequently be calculatedthrough the focal and the major axis:

D¼ Rfla ð3Þ

Note that, like D, f and R, a is also measured in millimetres.This expression proves that the camera-pattern distance does notdepend on the orientation of the camera.

Since we know the position of the CCD matrix in the camera, thedistance D0 can be measured experimentally. Therefore, in practice,D0 rather than D will be used as a camera viewing parameter.

Bearing in mind that in practice Dbd, d being the distancefrom the optical centre of the lens to the image plane, f is a good

approximation for d, and the next equation is verified:

D0ffiDþ f ð4Þ

Eqs. (3) and (4) can thus be used to estimate D0 as follows:

D0ffif

aRþ f ð5Þ

In order to calculate the major axis a (which is in millimetresin Eq. (5)), we first establish a relationship between the distancesin computer and image coordinate systems. We initially modelthis relationship through the parameter Kmm, this being a¼asKmm.Consequently,

D0ffif

asKmmRþ f ð6Þ

The conversion factor Kmm is empirically established by takinga set of images of the patterns from m known positions,D0i, i¼ 1,. . .,m. For each image, we obtain the major axis asi

(inpixels) and extract the conversion factor kmmi

using Eq. (7), whichis obtained directly from Eq. (6). The mean value is eventuallytaken as the definitive conversion factor Kmm:

kmmi¼

fR

ðD0i�f Þasi

, Kmm ¼1

m

Xm

i ¼ 1

kmmið7Þ

The major axis has different notations in Eqs. (5), (6) and (7)because the significance of the major axis is different in each case.In Eq. (5), a is the major axis in millimetres in the image plane ofthe camera; in Eq. (6), as signifies the major axis in pixels in the

Page 6: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 583

image (computer coordinate system (Ss)) and in Eq. (7), asi

signifies the major axis extracted in the i-th image (again in thecomputer reference system) during the process in which para-meter Kmm is calculated.

Owing to the distortion of the ellipse with vision angle f, therelationship between D0 and as is further modelled by using a non-linear function F(D0) which depends on D0. Formally:

D0 ¼f

asFðD0ÞRþ f ð8Þ

where the function F(D0) is empirically adjusted to an exponentialfunction:

FðD0Þ ¼ p1þp2D0p3 ð9Þ

Upon grouping these two expressions, we find

D0 ¼f

asðp1þp2D0p3 ÞRþ f ð10Þ

Since this equation is not analytically solvable in D0, aniterative algorithm is run by imposing a convergence thresholde and using the parameter Kmm as an initial solution. The finalalgorithm which calculates D0 is as follows:

e¼ eþ1

D0old ¼f

asKmmRþ f

while e4eFðD0oldÞ ¼ p1þp2D0p3

old

D0new ¼f

asFðD0oldÞRþ f

e¼ 9D0old�D0new9

D0oldðiÞ ¼D0new

End

3.2. Angular parameters

Camera location and orientation is known after calculatingangular parameters c, f, y and the distance D0. If we assume thatall the dots in the pattern are viewed in the image and that {P1, P2,P3, P4, P5, P6, P7, P8, P9} are the centroids of the dots, coefficients A,B, C, D, E (taking coefficient F¼1) of the general equation of aconic can be calculated by solving the overdetermined system:

ZH¼ I ð11Þ

where

H¼ A B C D E� �

ð12Þ

Z ¼ ½ x2s,i xs,izs,i z2

s,i xs,i zs,i �Ti ¼ 1,2:::8 ð13Þ

(xs,zs) being the computer coordinates and I being the 1�8 unitmatrix.

Once Eq. (11) has been solved, the swing angle, which corre-sponds to the angle (c) between the major axis of the ellipse andthe vertical reference axis Zs, can be calculated from Eq. (15):

If AaC, c¼atanðB=A�CÞ

2ð14Þ

If A¼ C, c¼ 451 ð15Þ

The coordinates of the centre of the fitted ellipse in thec-rotated reference system and major and minor axes can thenbe obtained. If we assume the c-rotated reference system, thegeneral equation of the ellipse is now

A0x02þC0z02þD0x0 þE0z0 þF 0 ¼ 0 ð16Þ

in which

A0 ¼ Acos2cþBsinccoscþC sin2c

B0 ¼ Bcos2c�ðA�CÞsin2c

C 0 ¼ Asin2c�BsinccoscþC cos2c

D0 ¼DcoscþEsinc

E0 ¼ Ecosc�Dsinc

F 0 ¼ F ð17Þ

If A0C040 and A0aC0, the normal equation of the ellipse is

ðx0 þD0=2A0Þ2

M=A0þðz0 þE0=2C 0Þ

M=C 0¼ 1;

M¼D02

4A0þ

E02

4C 0�F 0 ð18Þ

The computer coordinates of the centre and major and minorellipse axes are therefore

x0s ¼�D0

2A0, z0s ¼�

E0

2C0ð19Þ

ffiffiffiffiffiM

A0

r, b¼

ffiffiffiffiffiM

C0

r

Fig. 5 shows the relationship between the tilt angle and the minoraxis of the ellipse fitted to the dots:

b

RcosfD�Rsinf

¼cosf

ðD=RÞ�sinfð20Þ

If we assume that the term D/Rbsin f, and taking into con-sideration Eq. (3), we obtain

f¼ acosDb

Rf

� �¼ acos

b

a

� �ð21Þ

Note that the tilt angle is therefore related to the eccentricityof the ellipse e. Values of f are in the interval [0,901]. When e¼0,f¼0 and the points are fitted to a circle, whereas when e¼1,f¼901 and the ellipse is converted into a segment. If the tilt angledoes not vary, the ellipse will keep the same shape although thesize and orientation may change (see Fig. 5b). Fig. 5c illustratesseveral frames of the pattern from different positions, in which fand the values of eccentricity are maintained in each case.

Finally, the pan parameter is obtained from the computercoordinates of dots P1, P5, P3 or P7 in the ellipse coordinate system.Table 2 resumes all the information and equations needed tocalculate the pan parameter. The pattern homogeneous coordi-nates are shown in column S0. Column Se corresponds to the xe

and ze image coordinates, which are obtained from the transfor-mation M between the Se and S0 systems (see Eqs. (1) and (2)).Bearing in mind the relationship xe=ze ¼ xs=zs between image andcomputer coordinates, we eventually obtain the expression ofparameter y, which is shown in the last column.

Note that when ca0, non-rotated coordinates x0s and z0s mustbe substituted in Table 2. Bear in mind that all the equations atthe last column yield indeterminate values of y. This problem canbe solved by knowing the quadrant (in the system S0) in whichthe camera is placed. This quadrant can be found through the signof c and the position of the point P9 in the minor axis.

It can be proved that, for projective reasons, P9 is displacedwith regard to the theoretical ellipse centre. The distances d1 andd2 from P9 to the ellipse in the minor axis direction are conse-quently different. We can therefore infer whether the pattern isviewed from the left (case d14d2) or from the right (case d1od2).Fig. 6 illustrates the appearance of the ellipse depending on thequadrant and the applied correction of parameter y.

Page 7: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 5. (a) Relationship between the tilt angle and the minor axis of the ellipse. (b) Appearance of E when f varies. (c) Maintaining the value of angle f from several camera

positions (two examples). The eccentricity of the ellipse is included in the image.

Table 2

Obtaining parameter y from positions of P1, P5, P3 and P7 in the ellipse.

Dot S0 Se y

P1 ð0,0,�R,1ÞS0xe1 ¼�Rcosy y¼ atan

zs1

xs1 cosf

� �ð22Þ

ze1 ¼�RcosfsinyP5 ð0,0,R,1ÞS0

xe5 ¼ Rcosy y¼ atanzs5

xs5 cosf

� �ð23Þ

ze5 ¼ RcosfsinyP3 ðR,0,0,1ÞS0

xe3 ¼ Rsiny y¼ atan�xs3 cosf

zs3

� �ð24Þ

ze3 ¼�RcosfcosyP7 ð�R,0,0,1ÞS0

xe7 ¼�Rsiny y¼ atan�xs7 cosf

zs7

� �ð25Þ

ze7 ¼ Rcosfcosy

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595584

4. Optical axis misalignment

As was mentioned in Section 2, we assume that the pattern iscentred on the image. However, although in applications withmobile robots a visual-servoing algorithm is executed to centrethe pattern on the image, and with the AR application users cancentre the image themselves, in practice there is usually a smallmisalignment of the optical axis. A discussion concerning how toevaluate and correct the effects of this misalignment in the posemethod is presented in the following sub-sections.

4.1. Deviation angle

As was mentioned previously, the requirement that the patternmust be centred on the image and that the optical axis is thereforealigned with the line O0Oe is satisfied in the systems in which ourpositioning approach has been tested. Nevertheless, in this sectionwe present a solution with which to quantify the deviation of theoptical axis with regard to the origin of the pattern coordinates. Aswill be shown below, the optical axis misalignment does not mean-ingfully affect the pose parameter, particularly the camera-patterndistance value. However, for mobile robot applications, knowledgeof the on-board camera orientation is a valuable aspect. In order tomake the approach more understandable, here we present thesolution for a 2D case, in which a single angular deviation measureis obtained from axis X0. This case corresponds solely to a camera onboard a mobile robot which moves in a room and where the thirdcoordinate is constant. The camera orientation is thus obtainedthrough the angle g1 ¼+ðX0,XeÞ. The solution can be easily extendedto the 3D case by taking two angular parameters from the X0 and Z0

axes, respectively.

In the particular case in which the landmark reference system(S0) and camera reference system (Sc) are coplanar, the transfor-mation between (Sc) and (S0) (assuming that P9 is in the centre ofthe image) consists of a rotation f on the Ze axis and a translation.In a real case, for example with mobile-robot applications, it can bestated that the camera and the pattern would be positioned at thesame height. The intention, therefore, is to obtain a 2D positioning inwhich the component Zw is known.

In this case, the positioning algorithm of the camera isdetermined by the position te ¼ ðte,x,te,y,0Þ and the orientationangle g where now g1af. This case is illustrated in Fig. 7. Notethat the condition in which S0 and Se are coplanar, according tothe plane Ze¼Z0¼0, indicates that P9 should theoretically be onthe Zs axis in the image. In practice, the robot will expect to findP9 inside a central strip in the image.

Page 8: 3D real-time positioning for autonomous navigation using a nine-point landmark

290 300 310 320 330 340 350 360

220

230

240

250

260

270IV

290 300 310 320 330 340 350 360

220

230

240

250

260

270I

290 300 310 320 330 340 350 360

220

230

240

250

260

270

280

II

III

300 310 320 330 340 350220225230235240245250255260265

Fig. 6. (a) Correction of parameter y and aspect of the fitted ellipse depending on the quadrant in which the camera captures the image. (b) Detail of distances d1 and d2

from P9 to the ellipse E.

Fig. 7. Optical axis misalignment in the bidimensional case. Deviation angle calculation.

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 585

The position of the camera is calculated by the parameter fand distance D0 (Eqs. (21) and (8)). The camera coordinates are:

te,x ¼D0 sinf

te,y ¼D0 cosf ð26Þ

As was previously mentioned, the orientation of the camera isobtained from the angle g1 ¼+ðX0,XeÞ. This is accomplished bymeasuring the segment mimage (in pixels) in the image, and thedeviation angle of the optical axis d is then calculated. Therelationship is (see Fig. 7b):

tgd¼m

mimageLx_sensor

fLx_imageð27Þ

where m is the deviation in millimetres in the CCD sensor, Lx-sensor

is the dimension x of the sensor in millimetres and Lx-image is thedimension x of the image in pixels.

According to Fig. 7a, we finally conclude that

g1 ¼p2�l¼

p2�ðf�dÞ ð28Þ

In this paragraph we estimate the maximum relative error inparameter D which is caused by the misalignment of the opticalaxis. Let us assume that the pattern is located on the borderline ofthe image. The maximum deviation occurs when the computercoordinates of P9 are (Xmax/2�a/2, Ymax�b/2), Xmax and Ymax beingthe image dimension. The system then yields a distance D2 which

is longer than the distance D1 that the system would yield if theoptical axis were centred. In the case of maximum misalignment,we can establish that the optical axis deviation angle is alwaysless than the half of the angular aperture of the camera j.Therefore, dmaxrj/2. Fig. 8a illustrates the variation of the majoraxis when a misalignment occurs. Note that

a1 ¼ a2 cosd ð29Þ

From Eqs. (3) and (29), we obtain that the relative error ofparameter D is

DD

D

� �¼

a2�a1

a1¼ 1�cosd ð30Þ

In practice the deviations are below 21 which implies errorsof less than 0.06%. The maximum error can be limited byconsidering dmaxrj/2, j being the angular aperture of thecamera. Consequently

DD

D

� �max

o1�cosðj=2Þ ð31Þ

In our case, the angular aperture is below 151 so that in theworst case the relative error would be 0.9% at maximum camera-pattern distance. In order to minimize the errors caused by theoptical axis misalignment, the following section tackles the correc-tion of the projective distortion of the pattern.

Page 9: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 8. (a) Illustration of the major axis variation when the optical axis is misaligned with the line OoOe. (b) Distortion of the fitted ellipse owing to the off-centre position of

the pattern.

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595586

4.2. Correction to the projective distortion of the pattern

As is well known, several distortion sources modify the ideal‘‘pin-hole’’ model in real cameras. The distortion of our position-ing system is determined by two key factors. Firstly, we have theparameters concerning the intrinsic calibration of the camera:specifically, the relationship of the horizontal/vertical scale in theprojected image, the focal distance of the camera, the transforma-tion from millimetres on the image plane to pixels in the computerimage, displacement of the optical centre in the image and theoptical distortions. This source of distortion has been correctedthrough a typical calibration procedure.

The second distortion component corresponds to the deforma-tion of the ellipse when point P9 of the image is not located in thecentre of the image plane. In this case, the ellipse that adjuststhe projected points P1–P8 in the image is distorted, depending onthe viewpoint of each camera. This kind of distortion is exclusiveto the method presented here.

If the pattern is not centred in the image, the ellipse E thatadjusts the points fP1,P2,P3,P4,P5,P6,P7,P8g does not coincide withwhat would be obtained if it were centred. Fig. 8b illustrates theview of the pattern from both cases. In this case, the differentperspective of the pattern is responsible for this variation. Inorder to recover the accurate values of the major and minor axes,a correction is made on the positions of the points by using thederivative of the P9 coordinates.

Let us assume a non-centred orientation of the camera,where the coordinates of fP1,P2,P3,P4,P5,P6,P7,P8,P9g have alreadybeen corrected of optic distortions by following the intrinsiccalibration, and let ðuPi

,vPiÞi¼ 1,2. . .9 be the corresponding

coordinates. Since the viewpoint is not centred in the pattern,uP9

a0 and/or vP9a0. Under these circumstances, the perspective

correction of the coordinates in the image is modelled by theexpressions:

u0Pi¼ ðuPi

�uP9Þð1þcuu2

P9þcuu4

P9ÞþuP9

, i¼ 1,2,. . .,9 ð32Þ

v0Pi¼ ðvPi

�vP9Þð1þcvv2

P9þcvv4

P9ÞþvP9

, i¼ 1,2,. . .,9 ð33Þ

The distortion coefficients cu and cv are calculated in an off-line session where, for a fixed camera position, a pattern-centredpicture is taken along with t off-centre shots. If ðuPi_o

,vPi_oÞ, i¼

1,2,. . .,9 are the coordinates of the points in the pattern when itis centred in the image (so that uP9_o

¼ 0 and vP9_o¼ 0), and the

parameters cu and cv are obtained by a least square minimization

technique:

arg mincu ,cv

Xt

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX9

i ¼ 1

ðu0Pi�uP9�uPi_o

Þ2þðv0Pi

�vP9�vPi_o

Þ2

vuut8<:

9=;

0@

1A ð34Þ

By taking the homogenous transformations nomenclature,Eqs. (32) and (33) can be expressed as

u0Pi

v0Pi

1

0B@

1CA¼

A 0 �uP9ðAþ1Þ

0 B �vP9ðBþ1Þ

0 0 1

0B@

1CA

uPi

vPi

1

0B@

1CA, i¼ 1,2,. . .,9 ð35Þ

where A¼ 1þcuu2P9þcuu4

P9, B¼ 1þcvv2

P9þcvv4

P9.

The perspective correction matrix Sd is thus finally defined bymeans of

Sd ¼

A 0 �uP9ðAþ1Þ

0 B �vP9ðBþ1Þ

0 0 1

0B@

1CA ð36Þ

5. Pose algorithm under occlusion

As mentioned in Section 1, the majority of authors do notmake any reference to the performance of their method when thelandmark is occluded. In order to show the adaptability of ourpose approach, in this section we present the solution and resultsof the 9-Dots method under occlusion.

In practice, the pattern may be occluded when people orobjects obstruct the viewpoint of the camera or when this is notcentred on the pattern. In these cases, a part of the pattern isviewed by the user. The first case is common when the camera isfar away, and there is consequently enough space for an externalocclusion. The second case frequently occurs with short distancesin which the field of vision is reduced. Here, small movements bythe user (in the case of a portable AR system) may take severaldots out of the image.

This pattern has been designed to be used over a wide distancerange using the same algorithm. It is therefore formed of smallcircles which in turn belong to an outer circle. For long distancesthe pattern therefore looks like a set of tiny spots belonging to acircle, whereas for short distances the pattern is seen as a set ofcircles. Nevertheless, the pose algorithm with occlusion is estab-lished depending on the number of missing dots. We distinguishbetween several levels of occlusion.

When one or two outer dots of the pattern are missing in theimage, occlusion is categorized as level I. In this circumstance, the

Page 10: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 587

solution proposed in Sections 3 and 4 can be used by makingminor changes. Eq. (11) is maintained and all the parameters ofthe ellipse which best fit the dots can be calculated. Consequently,c, f and D0 are calculated. As regards parameter y, one or two ofthe four dots in Table 2 might be missing in the image, so that it isalways possible take the remaining dots to calculate y.

Occlusion level II occurs when more than two outer dots areoccluded but the internal dot P9 remains in the image. In this case,Eq. (11) does not converge and a new strategy must be implemented.This frequently occurs in the case of short user-pattern distances inwhich the view angle of the camera is reduced, and small headmovements made by the user may cause a loss of the dots in theimage. Since any visible circular dot can be viewed as an ellipse, thepose strategy is adapted for dot P9. In other words, the contour of P9

can be considered as an ellipse which has the same eccentricity asthe ellipse that fits all the external dots. The introduction of a scalefactor therefore allows us to calculate the ellipse that fits the visibledots, and it is then possible to follow the same procedure.

Finally, occlusion level III occurs when the internal dot P9

is occluded. In this case, when the number of outer points issufficient, Eq. (11) converges. This rarely occurs when severalouter dots are occluded and the internal dot P9 is not clearlyobserved or when the image of the pattern is placed on a borderof the image. In practice, the pattern can be occluded by obstaclesor by people walking in front of the camera. Several occlusioncircumstances also occur as a result of the proximity of thepattern. Fig. 9a illustrates examples of all occlusion levels.

Fig. 9. (a) AR navigation system under various types of occlusion

Table 3Absolute and relative variations between poses with and without occlusion for the thr

Level I 4. Level

Absolute variation (cm) Dx Dy Dz DD 6. Dx

Average 0.84 1.00 0.69 1.00 1.50

Std. dev. 1.07 1.01 0.72 1.28 1.13

Greatest 3.50 3.32 2.29 4.45 0.01

Smallest 0.07 0.01 0.02 0.01 0.01

7.

Relative variation (%) Dx Dy Dz DD 8. Dx

Average 1.15 2.05 2.14 1.34 0.17

Std. Dev. 2.17 2.04 7.11 1.71 3.40

Greatest 6.07 7.25 10.68 6.25 16.06

Smallest 0.15 0.01 0.07 0.01 0.20

In order to prove the effectiveness of the method underocclusion we present the following experimental test in whichthe three occlusion levels explained above are considered.

Different images of the pattern were taken from the fourquadrants and the results with and without occlusion were thencompared. Few of them captured a frame of the pattern from the firstquadrant and without occlusion. Up to five occlusions of each levelwere then generated manually, and the pose of the camera wascalculated according to the approaches explained above. Finally, thesame procedure was carried out for the second, third and fourthquadrants. The pose algorithm has thus been calculated in 64 cases,60 with occlusion and 4 without. This experimentation was carriedout using the colour camera on a head-mounted display with aconstant focal of 5.4 mm. In this experiment the distance camera-pattern was 90 cm for all the quadrants. Fig. 9b shows examples ofseveral frames with occlusion. Table 3 shows the variation of thecalculated coordinates and user-pattern distances for the four quad-rants corresponding to each occlusion with regard to the non-occluded case. Absolute and relative errors are presented in twosub-tables. The average, standard deviation, and greatest and smal-lest variations are also presented for each case. Note that for level II,the results for all occlusion cases remain constant owing to the factthat, in this case, the calculated ellipse depends solely on P9, and thisis the same for all the examples in this occlusion level. As Table 3shows, coordinate variation owing to occlusion is very low, whichconfirms the validity of our method in occlusion circumstances.Note that the highest errors occur when several adjacent spots are

. (b) Examples of occluded patterns in the experimental test.

ee occlusion levels.

II 5. Level III

Dy Dz DD 7. Dx Dy Dz DD

0.86 0.50 1.02 0.88 0.76 0.69 0.87

0.65 0.18 0.57 1.12 6.61 0.72 1.05

1.95 0.68 1.87 3.50 2.16 2.29 2.67

0.44 0.22 0.30 0.01 0.01 0.01 0.04

Dy Dz DD 9. Dx Dy Dz DD

1.17 1.43 1.38 1.11 1.43 2.26 1.12

1.17 3.62 0.78 2.28 1.42 7.30 1.43

3.72 2.72 2.62 6.07 8.04 16.5 3.75

0.95 1.01 0.43 0.02 0.02 0.06 0.05

Page 11: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595588

occluded. For example, in level III occlusion, the number 3 and4 frames have a higher error than frame number 1.

In conclusion, we can state that the advantage of our method isthat, even in the case of some of the dots being partially occluded,the parameters of the circumference which fits the dots’ centresare almost invariant. Each dot in the image appears as a smallcircle which is detected by using the Hough algorithm. The Houghtransform is implemented in the majority of the computer visionlibraries (we used the CV libraries). If a dot is partially occluded,the dot’s centre might vary slightly, but the influence of such avariation on the fitted circle is insignificant. This is shown in thesecond column of Fig. 16a in which one or two dots are incomplete.In the case of several dots being totally occluded, the 9-pointsmethod can manage up to three kinds of occlusions.

6. Experimentation of the method

6.1. Implementation details

The pose algorithm has been tested on several kinds of devices:b/w, colour, zoom and pan/tilt controlled cameras. The essentialphases in the entire pose processing method are as follows:

1)

The image is processed in order to find a set of candidateellipses. Each spot in the image is recognized as a small ellipse.The Hough algorithm, which is available in the majority ofcomputer vision libraries, was used for this purpose (we usedthe CV libraries).

2)

Recognition of the pattern. The relative position and size ofcandidate ellipses are used as the basis from which to recognisethe pattern.

3)

Assigning spot coordinates in the image. The centres of thesmall ellipses which contain the spots are taken as thecoordinates of points Pi.

4)

Correction of the projective distortion of the pattern. Thecoordinates of points Pi are updated by using the correctionmatrix in Eq. (36).

5)

Calculation of the dot ID. In the case of a b/w pattern, P9 isidentified as the centre of the biggest ellipse and P1 as thepoint with lowest coordinate zs.

6)

Calculation of the pose and deviation angle. 7) Calculation of the pattern ID (solely in the case of coloured

patterns).

The algorithm takes into account the relative size and posi-tions of a set of dots (detected through the use of the Houghalgorithm in a previous step) in the image. The recognitionalgorithm eventually yields the ID of the points in the pattern.When the pattern uses the same colour for all the spots, P1 isidentified as the lowest point in the image and P9 as the biggestdot. The remaining dots are numbered in a clockwise direction. Ifthe pattern uses different colours, the colour code itself permitsthe easy identification of each dot. Suffice it to say that P1

maintains the lowest position since the camera/user does notshake excessively, and axis Xc is therefore almost parallel to theplane Z0¼0. In our experience, this is a normal situation. Thiscircumstance could of course alter in the case of occlusion, buteven in this case, the occlusion type and the ID of the missing dotsis obtained.

The idea of considering 9 points in the pattern rather than asingle black circle is based on two points. With a single circlepattern, parameter y cannot be obtained and the pose would berestricted to 2D dimension cases. In other words, the method couldwork if the camera were placed in the plane Z0¼0. Note thatparameter y is calculated because the coordinates of any of the

points P1, P3, P5, P7 in the ellipse coordinate system can be extractedin the image, and Eqs. (22–25) can then be calculated. The secondreason is that by using colour information for a discrete number ofpoints, we can establish a colour code in the pattern and use amultitude of patterns, which is useful for building navigationpurposes and AR applications. Note that several colour patternsare shown throughout the paper (see Figs. 2 and 10). Moreover, thenumber of dots must be restricted (in our case, 8 dots). This is owingto the fact that the dots tend to be close together in the image foroblique views. A pattern with many dots is consequently inefficientbecause it will hardly be segmented.

Fig. 10 illustrates several examples in which the pose algo-rithm was run. The colour pattern was designed with a concentricset of spots with a diameter of 127 mm (P1–P8) and 254 mm (P9).Black–R–G–B colour coding was sufficiently robust for our diffuselighting environments. The colour code is defined according to thenumber of patterns per room and the number of rooms in thenavigation test.

If the optical axis deviates from the pattern’s centre, then theellipse alters a little with regard to the centred ellipse for reasons ofperspective. If this occurs, the ellipse is corrected by using theperspective correction matrix given in Eq. (36). As was stated inSection 1, for applications with pan/tilt controlled cameras, a visual-servoing algorithm is executed to centre the pattern in the image,and in applications in which a user carries the camera (i.e. ARapplications), the users themeselves are capable of roughly centringthe image. However, in practice a small error always exists.

For rotations around axis Y0 (assuming that the pattern iscentred in the image), the ellipse’s eccentricity does not changewhatever the rotation angle is. But for any other rotation axis,parameters a and b may change. We have attempted to clarify thisquestion in Fig. 5 by showing the appearance of the ellipsefrom different positions. Note that by following Eq. (3) themajor axis of the ellipse depends solely on D (camera-patterndistance), while minor axis b depends explicitly on angle f inEq. (20). Therefore, if we maintain parameter f and vary y, a and b

will be constant, the ellipse fitted to the ellipse will have the sameaxis, but the position of the outer points of the pattern will bedifferent.

The proposed method has been implemented in two environ-ments: mobile robots and augmented reality. An initial experimen-tation on mobile robots under ideal conditions can be found in [21].Thus, in the earlier version, the system worked under importantrestrictions: without occlusion, noise or real time conditions, assum-ing the optical centre alignment and under static requirements inwhich the robot stopped before taking a single shot of the scene. Theexperiments were carried out on a Pioneer multisensor mobile robotcarrying two cameras of different characteristics: the KC-N600PHCENTER Camera, which is a B/W auto-iris camera focused on securityapplications, and the SONY EVI D-31 controlled zoom camera. Adistance range from 30 to 700 cm was achieved with this camera,using a focal range from 5.4 to 26 mm. In some of the experimentsthe true position of the robot was obtained by using a Leica DISTOTM

A6 laser tape measure. The second experimental environmentcorresponds to AR applications. The pose algorithm was implemen-ted in an autonomous augmented reality system which consists of aTrivisio ARvision-3D HMD binocular head-mounted display with acolour camera and a Quantum3D Thermite portable computer.Section 6.2 provides an extended report of the AR test.

6.2. Test in augmented reality environments

Virtual reality users and professionals currently demand growingquality and a higher degree of realism in development and applica-tions. Of these technologies, augmented reality systems stand out asa result of their complexity and possibilities. The final objective here

Page 12: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 10. (a) The pattern in a non-structured environment. From top to bottom: detection of the pattern in the image, dot labelling, and ellipse fitted in the zoomed image

and pose calculation. (b) Example of poses from different camera positions. (c) 3D model of the interior and visualization of a simple path. The red path is the ground-truth,

whereas the blue path corresponds with the calculated poses.

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 589

is to extend and complete the real visual information perceived bythe user, superimposing three-dimensional information synthesizedwith a high degree of realism. In this section, we prove the appli-cability of our method in an autonomous AR system.

As we know, the augmented reality process has three basiccomponents. The first is an image capture module (camera) withwhich to provide images from the real environment. The second, avisualization module (stereo projection systems or augmentedreality glasses), solves the problem of making up and displayingthe augmented real world with unreal information. The thirdcomponent – which is not always solved by technology – mustprovide the user’s position and orientation in the real environ-ment. Although there are positioning and tracking systems, allof them can be effective in controlled environments. However,for autonomous systems, positioning must be solved with newsolutions.

Fig. 11 shows an explanatory chart of our AR process. We firstobtain realistic models through reverse engineering using aMinolta VIVID 910 laser scanner which supplies the geometricand colour information from the scene. A complete and uniquecolour-model is obtained by using all the range images and partialcolour images taken from different viewpoints. The completemodel is then ready to be inserted in a graphics system. In theaugmented reality stage, the user wears an HMD device with two

cameras. The user captures the real scene, maintaining the landmarkin the camera field, and the camera position with regard to thelandmark is calculated by using the algorithm proposed. The syn-thetic model can then be virtually inserted into the real image byusing the graphics resources. Finally, the user perceives the coex-istence of real and virtual information in the same space and at thesame time.

The pose algorithm presented in this paper was implementedin an autonomous augmented reality system which consists of aTrivisio ARvision-3D HMD binocular head-mounted display withtwo colour cameras and a Quantum3D Thermite portable com-puter. Fig. 12 shows several experiments which were carried outin the lab. The 3D models that were inserted correspond to avaluable collection of archaeological objects. The user, wearingthe portable AR system, seeks the pattern when s/he wishes todiscover its current position in the world coordinate system. TheHMD then allows the user to see the image from the camerasuperimposed with 3D virtual models in real time. Note that theobserver can explore the piece from any viewpoint.

The system works with 640�480 images and takes 30 ms tocapture one frame, process the image and calculate the pose. Theaverage rate of performance is therefore 30 fps for superimposedvirtual models of around 3000 triangles. Note that, since a virtualmodel (which is composed of thousands of patches) must be

Page 13: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 11. Components of the augmented reality process in which the 9-points pose solution has been integrated.

Fig. 12. Inserting virtual models in reality. (a) First row: recognition of the landmark and observer pose calculation; second row: insertion of the virtual models in the

image (funeral altar, second century A.D. with 40,000 triangles). (b) Several frames inserting an almohade jar in the image (twelfth century with 40,000 triangles).

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595590

inserted in the image, the visualization rate decreases a little. Twodifferent environments – indoors and outdoors – have been testedwhile imposing occasional occlusions.

A second set of experiments is shown in Fig. 13. In this case,the user can see and compare in real time the object, which iscaptured by the cameras on board the HMD, with the 3D virtualmodels superimposed on the image. As in previous experiments,

the system can calculate the pose with regard to the 9-pointspattern in the image and then calculate the appropriate pose ofthe 3D model in the graphics image coordinates so that they canbe seen and compared. The chart in Fig. 13a presents the set ofcoordinate transformations between different components of thesystem (real camera, graphics camera, real world and virtualworld) when an augmented reality process takes place.

Page 14: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 13. (a) Augmented reality process: chart with the set of coordinate systems involved. (b) Inserting and comparing reality and 3D virtual models in the museum. 3D

models are superimposed onto the on-line image. Colour, scale and position of the virtual models can be manipulated in the image to allow the spectator to visualize them

better and compare them.

Fig. 14. Location of the camera with regard to the 9-Dots landmark in the

comparison test.

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 591

The user can explore cultural heritage pieces, which have beendigitalized with quality requirements, inserted in the reality. Theuser can of course move freely around the scene looking atthe synthetic model that is overlapped onto the image. Severalexamples are shown in Fig. 13b. Note that colour, scale andposition of the virtual models are manipulated in the image toallow the spectator to visualize them better and compare them.This experimentation was carried out in a public museum (TheMerida National Roman Art Museum, Spain). The final goal of thiswork consists of creating small virtual spaces in places in whichthe original piece will be temporarily absent from the museum.The piece can then be virtually viewed in the same place and posein which it was positioned.

6.3. Experimental comparison with other monocular pose methods

under occlusion and noise

ARToolkit [13,23] is one of the most commonly used position-ing methods based on markers, and is particularly used inaugmented reality applications. In order to demonstrate thegoodness of our method, it has been extensively compared withARToolkit under the same set-up and conditions. Both fudicialswere built with similar dimensions, placed under the same lightconditions and separated by 305 mm.

The test was carried out after processing single frames indifferent static camera positions. In other words, contrary todynamic cases – in which a tracking procedure is used – the posealgorithm did not use any pose information from previous frames.This allowed us to obtain the ground-truth for each camera

Page 15: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595592

position. The ground-truth was set by using a Leica DISTOTM A6laser tape measure. Fig. 14a shows the location of the landmark in20 different camera positions from four different quadrants. Two

Fig. 15. Comparison of absolute and relati

Fig. 16. (a) Top-bottom, left-right and diagonal occlussions for ARToolkit and 9-Dots l

error in the test under occlusion.

images, with and without occlusion, were captured in each position.We used the SONY EVI D-31 camera with focal f¼5.4 mm, and thedistance range between the camera and the landmark was set from

ve error in the test without occlusion.

andmarks. Both landmarks are separated by 305 mm. (b) Comparison of absolute

Page 16: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 593

58 to 500 cm. Although the original ARToolkit coordinate system isdifferent to the 9-Dots coordinate system, it was adapted in order tomake an understandable comparison of the results. The experi-mental comparison was carried out in four aspects as follows.

Firstly, we compared the performance of the methods withoutocclusion. Fig. 15 shows absolute and relative pose errors for bothtechniques. Our method gave similar results to those of theARToolkit. For the absolute and relative average errors in coordi-nates X, Y and Z the results were 3.45 cm and 8.93%; 1.48 cm and2.02%; 3.13 cm and 8.3% for 9-Dots and 5.29 cm and 11.8%;1.25 cm and 11.8%; 4.84 cm and 18.49% for ARToolkit.

We then carried out the test under occlusion. In this case, thesurface of each landmark was successively occluded by takingincreasing occlusion percentages until each method was unable tocompute the pose of the camera. We took occlusions from threedifferent directions: left-right, top-bottom and diagonal. Fig. 16aillustrates several examples of each occlusion sequence.

As in the case of non-occlusion, promising results wereobtained. For horizontal and vertical occlusion directions, theARToolkit method was unable to compute the pose of the camerafor occlusions above 10% and it did not work for any type ofdiagonal occlusion. This is owing to the fact that when thevertices of the landmark are missing in the image, the algorithmcannot properly calculate the pose. However, our system workswhen some of the dots on the landmark are missing, and is able toobtain the ellipse fitted to the other dots and then calculate anapproximated pose. Our method therefore worked for occlusionsof up to 40%, yielding acceptable relative errors. Fig. 16b and cshow the average errors for horizontal/vertical and diagonalocclusions for both techniques.

In the third experiment we attempted to compare the respec-tive distance ranges. The camera was now moved away from thelandmarks at successive intervals of 20 cm, and this process wasrepeated in five different directions. The initial position was taken

Fig. 17. (a) Images of the pattern from different distances. (b) Comp

100 cm away from the landmarks. Fig. 17 shows the pose errorversus the distance landmark-camera. As we can see, the slope ofthe error function corresponding to ARToolkit is higher than thatof the 9-Dots. We can conclude that ARToolkit worked fordistances below 300 cm whereas 9-Dots worked at up to 440 cm.

The fourth comparison test was carried out by injectingrandom noise into the image. Instead of adding artificial (gaussianor salt and pepper) noise, we generated natural noise sources. Apiece of plastic was thus located in front of the lens (see Fig. 18a).The first conclusion is that it is easier to recognize the 9-Dotspattern than that of the ARToolkit since 9-Dots was recognized inthe image on 100% of the occasions on which it was tested, asopposed to the 33% achieved by ARToollkit. The second conclu-sion is that the pose accuracy for 9-Dots supersedes that ofARToolkit owing to the fact that the variation of the centre ofthe dots under noise is less sensitive than a slight change in thepositions of the vertices of the rectangle that ARToolkit uses. Asummary of the results for both tests can be seen in Fig. 18b.Absolute errors of D and coordinates XYZ are plotted. Note that9-Dots is quite robust to noise whereas, in most cases, ARToolkitis unable to calculate the pose.

In summary, we can state that although ARToolkit’s rectanglemarker is apparently very easy to detect, even in a cluttered environ-ment, it had two serious limitations. Firstly, for long distances theaccuracy of the pose meaningfully decreases because of the impreci-sion of the vertex coordinates in the image, and ARToolkit’s methodtherefore yields worse results than the method presented here. Thecomparison of errors in the distance range test proves this assess-ment. Secondly, the method does not work when a single vertex ofthe rectangle is occluded, even if the occlusion percentage is verylow. Thus, for horizontal and vertical occlusion directions theARTookit was unable to compute the pose for occlusions of above10%. Finally, it has been proved that our method is more robustunder noise conditions.

arison of absolute and relative errors in the distance range test.

Page 17: 3D real-time positioning for autonomous navigation using a nine-point landmark

Fig. 18. (a) Experimental setup with a piece of plastic located in front of the camera. (b) Four noisy images of the pattern. (c) Comparison of absolute and relative errors in

the random noise test.

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595594

7. Contributions and conclusions

This paper proposes a new method with which to determinethe camera pose in autonomous navigation circumstances. Unlikemost of the reports published on this issue, this document providescomplete information regarding the performance of the method indifferent environments and circumstances (several cameras, perfor-mance under occlusion and noise), along with quantitative data(accuracy, distance range, image size and performance) and otherinteresting aspects (adaptability, colour codes for different patterns,flexibility at different distances, easy recognition of the landmark,etc.). After evaluating all this information we can conclude that themethod proposed here provides evident advantages in comparisonto other well established geometrical methods. We believe that ourwork makes an original contribution in several respects.

Firstly, the method provides better results than some previouspose solutions. Instead of the usual quadratic fiducials, we havedesigned a simpler landmark which requires less image proces-sing, can be used in large range distances and, more importantly,is able to work under occlusion and noise circumstances. In orderto prove the goodness of our approach, an experimental comparisonwith one of the most widely used methods has been presented inthis paper.

An important property of the system concerns the real-timeresponse. Because of its simplicity, the landmark is quickly foundin the image and the pose calculation time is insignificant. Thisallows us to work up to 35 fps in mobile robot applications and30 fps in augmented reality applications. In this respect, themethod yields comparable results to other similar approaches.

Another remarkable aspect to bear in mind is that this approachhas been tested on different applications, yielding notable results.Many pose solutions have been published in the last few years.However, few of them have been used in augmented realityapplications. Our contribution should therefore be considered asan extension of pose solutions which also works in real time andocclusion circumstances. In other words, it is suitable for augmentedreality based navigation. Few solutions can currently be found inthis field.

Our future work is addressed towards increasing the effec-tiveness of the approach in a variety of lines. On the one hand, thesystem’s robustness under changing lighting conditions must beimproved. This problem could be alleviated by using a lightingsystem, but this might cause additional problems, particularly inAR applications. A computational and algorithmic refinement ofthe method is of course essential if future goals are to beachieved. We also aim to implant this technique in robot serviceapplications, such as positioning guide-robots in museums, pro-viding vigilance-robots in car parks and developing real applica-tions in AR autonomous systems in the future.

References

[1] D. Cobzas, M. Jagersand, P. Sturm, 3D SSD tracking with estimated 3D planes,Journal of Image and Vision Computing 27 (2009) 69–79.

[2] F. Duan, F. Wu, Z. Hu, Pose determination and plane measurement using atrapezium, Pattern Recognition Letters 29 (3) (2008) 223–231.

[3] W., Feng, Y., Liu and Z., Cao, Omnidirectional vision tracking and positioningfor vehicles, in: Proceedings of the ICNC ’08, Fourth International Conferenceon Natural Computation, vol. 6, 2008, 6 pp. 183–187.

Page 18: 3D real-time positioning for autonomous navigation using a nine-point landmark

A. Martın, A. Adan / Pattern Recognition 45 (2012) 578–595 595

[4] M., Fiala, Linear markers for robots navigation with panoramic vision, in:Proceedings of the First Canadian Conference on Computer and Robot Vision,2004, pp. 145–154.

[5] G., Jang et al., Metric localization using a single artificial landmark for Indoormobile robots, in: Proceedings of the International Conference on IntelligentRobots and Systems, (IROS 2005), 2005, pp. 2857–2862.

[6] K. Josephson, et al., Image-based localization using hybrid feature correspon-dences, IEEE Conference on Computer Vision and Pattern Recognition 1–8 (2007).

[7] U., Neumann et al., Augmented reality tracking in natural environments, in:Proceedings of the International Symposium on Mixed Reality, ISMR 99,1999, pp. 101–130.

[8] K. Xu, K.W. Chia, A.D. Cheok, Real-time camera tracking for marker-less andunprepared augmented reality environments, Image and Vision Computing26 (5) (2008) 673–689.

[9] S. Se, D. Lowe, J. Little, Mobile robot localization and mapping withuncertainly using scale-invariant visual landmarks, The International Journalof Robotics Research 21 (8) (2002) 735–757.

[10] L., Vachetti, V., Lepetit, P. Fua, Combining edge and texture information forreal-time accurate 3D camera tracking, in: Proceedings of the Third IEEE andACM International Symposium on Mixed and Augmented Reality, ISMAR2004, 2004, pp. 48–56.

[11] L. Vachetti, V. Lepetit, P. Fua, Stable real-time 3D tracking using online andoffline information, IEEE Transactions on Pattern Analysis and MachineIntelligence 26 (10) (2004) 1385–1391.

[12] A.J. Briggs, et al., Mobile robot navigation using self-similar landmarks, in:Proceedings of the IEEE International Conference on Robotics and Automa-tion, ICRA ’00, vol. 2, 2000, pp. 1428–1434.

[13] H. Kato., et al.: Virtual Object Manipulation on a Table-Top AR Environment.IEEE and ACM International Symposium on Augmented Reality, 2000. (ISAR2000). Proceedings. 111–119, 2000.

[14] D. Koller, et al., Real-time vision-based camera tracking for augmentedreality Applications, in: Proceedings of the ACM Symposium on VirtualReality Software and Technology 1997, pp. 87–94.

[15] Andrew J. Davison, Ian Reid, Nicholas Molton, Olivier Stasse, MonoSLAM:real-time single camera SLAM, IEEE Transactions on Pattern Analysis andMachine Intelligence (2007) 1052–1067.

[16] V. Lepetit, P. Lagger, P. Fua, Randomized trees for real-time keypointrecognition, in: Proceedings of the 2005 IEEE Computer Society Conference

on Computer Vision and Pattern Recognition (CVPR’05), vols. 2–02, 2005, pp.775–781.

[17] E. Foxlin, L. Naimark, VIS-Tracker: a wearable vision-inertial self-tracker,Proceedings of the IEEE In Virtual Reality (2003) 199–206.

[18] G.D. Hager, P.N. Belhumeur, Efficient region tracking with parametric modelsof geometry and illumination, IEEE Pattern Analysis and Machine Intelligence20 (10) (1998) 1025–1039.

[19] Pascal Fua Mustafa Ozuysal, , Vincent Lepetit, Fast keypoint recognition inten lines of code, in: Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, CVPR ’07 (2007) 1–8.

[20] D. Wagner, G. Reitmayr, A. Mulloni, T. Drummond, D. Schmalstieg, Posetracking from natural features on mobile phones Mixed and AugmentedReality, 2008, in: Proceedings of the Seventh IEEE/ACM International Sym-posium on Mixed and Augmented Reality, ISMAR 2008, 2008, pp. 125–134.

[21] A. Adan, A. Martın, R. Chacon, V. Domınguez, Monocular model-based 3Dlocation for autonomous robots, Lecture Notes in Computer Science 5317/-1(2008) 594–604.

[22] B.M. Haralick, C.N. Lee, K. Ottenberg, M. Nolle., Review and analysis ofsolutions of the three point perspective pose estimation problem, Interna-tional Journal of Computer Vision 13 (3) (1994) 331–356.

[23] /http://www.hitl.washington.edu/artoolkit/S.[24] X.S. Gao, X.R. Hou, J. Tang, H.F. Cheng., Complete solution classification for

the perspective-three-point problem, IEEE Transactions on Pattern Analysisand Machine Intelligence 25 (8) (2003).

[25] L. Quan, Z. Lan., Linear N-point camera pose determination, IEEE Transactionson Pattern Analysis and Machine Intelligence 21 (8) (1999).

[26] D. DeMenthon, L. Davis., Exact and approximate solutions of the perspective-three-point problem, IEEE Transactions on Pattern Analysis and MachineIntelligence 14 (11) (1992).

[27] D. Lowe., Fitting parameterized three-dimensional models to images, IEEETransactions on Pattern Analysis and Machine Intelligence 13 (5) (1992)441–450.

[28] J.S.C. Yuan, A general phogrammetric solution for the determining objectposition and orientation, IEEE Transactions on Robotics and Automation 5 (2)(1989) 129–142.

[29] D. DeMenthon, L. Davis., Model-based object pose in 25 lines of code,International Journal on Computer Vision 15 (1m) (1995) 123–141.

Alberto Martin received the Technical Engineering in Computer Managements degree in 2003 and Engineering in Computer Science in 2006, both from the University ofCastilla-La Mancha. Currently, he is finishing his Ph.D. thesis at this institution. His research interests include camera calibration, pose computation and augmented reality.

Antonio Adan received the M.Sc. degree in Physics from both Universidad Complutense of Madrid and Universidad Nacional de Educacion a Distancia (UNED), Spain in1983 and 1990, respectively. He received the Ph.D. degree with honours in Industrial Engineering. Since 1990, he is an Associate Professor at Castilla La Mancha University(UCLM) and leader of the 3D Visual Computing Group. His research interests are in Pattern Recognition, 3D Object Representation, 3D Segmentation, 3D Sensors and RobotInteraction on Complex Scenes. Along this time he has made more than 100 international technical contributions on prestigious journals and conferences. From 2009 to2010, he was a Visiting Faculty at the Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA. Dr. Adan was awarded of the Twenty Eighth Annual PatternRecognition Society Award corresponding to the best paper published in Pattern Recognition Journal during 2001. Dr. Adan is a member of the Institute of Electrical andElectronic Engineers (IEEE).