trust your imu: consequences of ignoring the imu drift

Trust Your IMU: Consequences of Ignoring the IMU Drift

Marcus Valtonen Ornhag1 Patrik Persson1 Marten Wadenback2 Kalle Astrom1 Anders Heyden1

1Centre for Mathematical SciencesLund University

2Department of Electrical EngineeringLinkoping University

marcus.valtonen [email protected]

Abstract

In this paper, we argue that modern pre-integrationmethods for inertial measurement units (IMUs) are accu-rate enough to ignore the drift for short time intervals. Thisallows us to consider a simplified camera model, which inturn admits further intrinsic calibration. We develop thefirst-ever solver to jointly solve the relative pose problemwith unknown and equal focal length and radial distor-tion profile while utilizing the IMU data. Furthermore, weshow significant speed-up compared to state-of-the-art al-gorithms, with small or negligible loss in accuracy for par-tially calibrated setups.

The proposed algorithms are tested on both synthetic andreal data, where the latter is focused on navigation usingunmanned aerial vehicles (UAVs). We evaluate the pro-posed solvers on different commercially available low-costUAVs, and demonstrate that the novel assumption on IMUdrift is feasible in real-life applications. The extended in-trinsic auto-calibration enables us to use distorted input im-ages, making tedious calibration processes obsolete, com-pared to current state-of-the-art methods.1

1. IntroductionA popular approach in Simultaneous Localization and

Mapping (SLAM) is to fuse various sensor data to increasethe performance of the system. A common pair of sen-sors to combine is a camera and an IMU. Systems of thiskind are labeled as visual-inertial odometry, and this spe-cific sensor combination is often found on consumer de-vices, such as smartphones and UAVs. As is well-known,the projective relationship between two cameras manifestsitself in the fundamental matrix, independent of the scenegeometry. When auxiliary data are known (e.g. IMU data)the number of degrees of freedom decreases and the corre-sponding fundamental matrix is constrained, which enables

1This work was supported by the strategic research projects ELLIITand eSSENCE, the Swedish Foundation for Strategic Research project,Semantic Mapping and Visual Navigation for Smart Robots (grant no.RIT15-0038), and Wallenberg AI, Autonomous Systems and SoftwareProgram (WASP) funded by Knut and Alice Wallenberg Foundation.Code available at: https://github.com/marcusvaltonen/DronePoseLib.

Figure 1. The proposed 4-point solver is able to accurately per-form radial distortion auto-calibration for focal length and motionparameters. This is a novel case in the literature and is primar-ily made feasible by a clever assumption—to use the completerotation estimate from pre-integrated IMU data. We perform ex-periments with two different UAVs in various difficult scenarios,demonstrating that this assumption comes with a vast array of ben-efits, e.g. simpler equations, faster solvers, and little to no loss inaccuracy compared to other state-of-the-art methods.

one to compute it with fewer point correspondences. Thispotentially reduces the impact of noise; however, such alge-braic constraints can be highly non-trivial to incorporate ina solver. Popular methods to handle the resulting polyno-mial systems of equations use theory from algebraic geom-etry, e.g. the action matrix method [3], turning the probleminto a generalized eigenvalue problem (GEP) [18, 20], andresultant based methods [1]. Regardless of which methodis used, there is still much work in terms of parameteriz-ing the original problem, as different approaches may yieldcompletely different results [23]. For the resulting solver tobe of any practical use, it must also be numerically stable,which adds further considerations to the design.

1

arX

iv:2

103.

0828

6v2

[cs

.CV

] 1

6 M

ar 2

021

https://github.com/marcusvaltonen/DronePoseLib

https://github.com/marcusvaltonen/DronePoseLib

Since modern image sensors often have square-shapedpixels and the lens is sufficiently aligned such that the prin-cipal point coincides with the optical center, a feasible as-sumption is to use partially calibrated cameras, where theonly unknown intrinsic parameter is the focal length. Thisspecific assumption has proven useful in several differentreal scenarios including relative pose estimation [13, 17, 22,5, 11] and absolute pose estimation [41, 25].

Although it often comes at the cost of introducing dis-tortion, having a wide field of view is desirable in manyapplications. When working with visual odometry, it is,therefore, a standard procedure to correct for these unde-sirable distortion artifacts, which often requires a specificcalibration setup, typically involving a checkerboard pat-tern. By incorporating a distortion model, as well as focallength, together with the motion model, one may omit suchprocedures altogether; however, due to the difficulty of theproblem, no fast and robust minimal solver has yet beenproposed. The main contributions of this paper are:

• We take advantage of IMU data to estimate the full 3Dorientation. Under the assumption that the IMU drift isnegligible for short time intervals, the resulting poly-nomial systems of equations are significantly easier tosolve.

• By using this approach, we are able to treat the par-tially calibrated case with unknown radial distortionprofile while incorporating the IMU data, resulting ina fast and reliable solver. This is the only solver tohandle this case to date.

• Furthermore, we show a considerable speed-up com-pared to other state-of-the-art methods, with small orinsignificant loss in accuracy, when exploiting the as-sumption of negligible IMU drift. This benefits low-cost and embedded devices, which constitute the ma-jority of consumer devices where these algorithms areused in practice.

2. Previous Work2.1. Visual-Inertial Odometry

The calibrated visual-inertial problem of relative pose iswell-studied [27, 8, 37, 32] and efficient solvers exist. If weassume that the gravity direction is aligned with the y-axis,the corresponding essential matrix (after alignment) is givenby Ey ∼ [t]×Ry , or explicitly,

Ey =

[−ty sinφ −tz ty cosφ

tz cosφ+tx sinφ 0 tz sinφ−tx cosφ−ty cosφ tx −ty sinφ

], (1)

where

Ry =

cosφ 0 sinφ0 1 0

− sinφ 0 cosφ

, (2)

Pose 2

Pose 1gravity

y

zx y

zx

t

Figure 2. Assume the IMU measurements are accurate, i.e. theaccelerometer and gyroscope data can be used to accurately esti-mate the relative orientation between two consecutive views. Thenthe only unknown extrinsic parameter is the translation vector be-tween the poses.

and t = (tx, ty, tz). This makes it possible to use a param-eterization with six elements,

Ey =

e1 e2 e3e4 0 e5−e3 e6 e1

. (3)

Since we have four degrees of freedom (three translationelements and one angle), the elements ei of (3) are not in-dependent. In fact, one can check that they must obey the(modified) Demazure equations, also known as the traceconstraint, 2EyE

Ty Ey − tr(EyE

Ty )Ey = 0, or explicitly,

e22 − e24 − e25 + e26 = 0,

e1e2e6 + e1e4e5 + e3e25 − e3e26 = 0,

e1e24 − e1e26 − e2e3e6 + e3e4e5 = 0,

(4)

as well as the rank constraint det(Ey) = 0,

e1e2e4 + e1e5e6 + e2e3e5 − e3e4e6 = 0 . (5)

These constraints were used in [8] to build a minimal solverfor the calibrated case2. We also note an easy decom-position into rotation and translation components, givenby e21 + e23 = t2y . In [11] it was shown that the minimalcase can be solved using a single affine correspondence.

The problem becomes more difficult when adding an un-known focal length. Without any IMU data available, but

2In [8] they align the z-axis with the gravity instead.

2

still considering the partially calibrated case with only un-known focal length, the corresponding fundamental matrixhas six degrees of freedom. This problem, therefore, re-quires a minimal case of six point correspondences, withthe current state-of-the-art solver by Kukelova et al. [22].Ding et al. [5] proposed a minimal solver for two partiallycalibrated cases while incorporating the IMU data. Thiswas done by explicitly parameterizing the rotation about thegravity direction, and turning the problem into a generalizedeigenvalue problem (GEP).

When assuming Ry = I , see Figure 2, the essential ma-trix is E = [t]×, which makes the governing equations sig-nificantly easier. The minimal calibrated case requires onlytwo point correspondences, and the epipolar constraint fora single pair of point correspondences x↔ x′, is given by

x′TEx = 0⇔ (x× x′)T t = 0 . (6)

In [9] it was also shown that the non-minimal case can besolved with global optimality guarantees.

2.2. Relative Pose with Unknown Distortion Profile

When constructing minimal solvers, it is often desirableto use as few parameters as possible. This increases ro-bustness in RANSAC-like frameworks, as fewer iterationsare needed in order to select a sample free from outliers.The one-parameter division model [6], has therefore beenfrequently used, as it performs well with only a single pa-rameter for a large variety of different lenses. In this model,the radially distorted image point x = (x, y, 1) is assumedto be mapped to its corrected counterpart x through the fol-lowing parametric relation

x = f(x, λ) =

xy

1 + λ(x2 + y2)

, (7)

where λ controls the level of distortion. It has been usedsuccessfully in a number of applications [21, 33, 22, 34, 25,39].

The case of relative pose with unknown focal length andunknown distortion parameter is known to be hard. Thetwo-sided problem, i.e. equal and unknown focal length andradial distortion parameter, was first studied in [13]; how-ever, by today’s standards, one cannot say that the proposedsolver has much practical use: the elimination template sizeis very large, 886 × 1011, with 68 putative solutions, anda reported runtime of 400 ms. In [23] the elimination tem-plate size was reduced to 581 × 862 using their proposedreduction step; however, no analysis of the numerical sta-bility was performed. Regardless, it remains impractical forreal-life applications, as the size is still exceedingly large.There has been some theoretical work on the problem, andmore generally on distortion varieties [15]; however, no vi-able real-time solver for the case exists.

The one-sided case, i.e. with one calibrated camera andone camera with unknown focal length and radial distortionparameter, has been studied further. The first solver wasintroduced in [17], but was not numerically stable and theelimination template size was quite large, 200× 231. It haslater been improved in [22], and is now both numericallystable and fast, with an elimination template size of 51 ×70. The one-sided case, however, is mostly artificial, as itassumes one of the cameras to be calibrated, which limitsthe applicability of the method severely.

To the best of our knowledge, the relative pose problemwith unknown and equal focal length and radial distortionparameter incorporating IMU data has not been solved. Wewill solve this case and show that the resulting solver is ex-tremely fast compared to the methods discussed in this sec-tion, with an elimination template size of merely 10 × 21.This is done using a special assumption, which we shall dis-cuss next.

3. Why Ignore the IMU Drift?When measurements from the accelerometer and gyro-

scope are combined in an orientation filter [35, 35, 16, 28,38] the gravity direction is preserved; however, the yawangle begins to drift. Because of this, most visual-inertialmodels try to incorporate an unknown angle about the grav-ity direction [27, 8, 37, 32, 4, 5, 11, 40]. Already in the cal-ibrated case the governing equations (4)–(5) are non-trivialand quadratic or cubic in nature. Parameterizing the rotationmatrix will also result in at least second order equations.

Instead of filtering, another approach is to simply in-tegrate the gyroscopic data to obtain the relative orienta-tion [7]. This eliminates error sources that potentially areintroduced while fusing the gyroscopic data with the ac-celerometer data, e.g. Coriolis forces—when the IMU isnot in the center of rotation—or acceleration due to mo-tion. Both corrupt the measurement of the gravity direc-tion. Over time integration errors, sensor bias and sensornoise will lead to drift in all axes; however, for short timeintervals, this drift is very small. Additionally, if the IMUsensor has been stationary at some point in time—which isa reasonable assumption for UAVs that start from a groundposition—the gyroscopic bias can be observed. The biaschanges very slowly, even in low-cost consumer-availableIMUs, and the observed bias can therefore be used to com-pensate for it in the remaining part of the sequence. In sum-mary, we obtain very accurate relative rotations by simplyintegrating the gyroscopic data between camera pairs.

Valtonen Ornhag et al. [39] proposed using orientationfilters to estimate the camera rotation and thereby bene-fit from a relaxed problem. They showed that this al-lowed them to perform radial distortion correction, whilestill maintaining speed and accuracy. The method, however,relied on a homography-based approach, requiring planar

3

10−20 10−15 10−10 10−5 10

500

1,000

1,500

Fundamental matrix error

Freq

uenc

y

10−20 10−15 10−10 10−5 10

500

1,000

1,500

Focal length error

Freq

uenc

y

Our (3 pt) Our (4 pt) Ding et al. [5] Kukelova et al. [22] Jiang et al. [13]

Figure 3. Error histogram for 10,000 randomly generated problem instances. The proposed 4-point method and the 7-point method [13]also solve for an unknown radial distortion parameter.

objects in the scene geometry, thus limiting the applicabil-ity of the method.

Temporarily losing the gravity direction is not a majorconcern for the relative pose problem. In a complete SLAMframework, one would typically perform a visual-inertialinitialization step [29, 14, 31, 2] which recovers the grav-ity direction as well as metric scale and bias. By trustingthe IMU data we note the following:

(i) we remove one degree of freedom from the camera pa-rameterization,

(ii) the relative pose problem becomes linear in the un-known translation, according to (6),

(iii) we open up the possibility for further intrinsic calibra-tion and still perform fast and accurate in real-time ap-plications.

In the next section, we will show how this is done in prac-tice.

4. Consequences of Ignoring the IMU Drift

We construct two solvers based on our simplifying as-sumption. Note that the derivations are quite short, whichis mainly due to the linear dependence on the translationvector.

4.1. Unknown and Equal Focal Length (3-point)

For the case of unknown and equal focal length, the fun-damental matrix is given by F = K−1[t]×RK−1, whereR = R

(2)imuR

(1)Timu is the relative orientation. By parameter-

izing K−1 = diag(1, 1, w), we have four unknowns—thetranslation t and w. The epipolar constraints x

′Ti Fxi = 0,

for i = 1, 2, 3, yield three equations, which are linear in t.Therefore, the resulting system can be written as

M(w)t = 0, (8)

where M(w) depends only on w. Even with non-degenerate configurations, the matrix M ∈ R3×3 cannothave full rank, hence det(M) = 0. This yields a singlequartic equation in the unknown w, which can be solvedefficiently using the quartic root finding formula. Conse-quently, we have four putative solutions, from which thetranslation t can be retrieved by finding the null space ofthe 3 × 3 matrix M(w∗), where w∗ is a putative solution.For the 3×3 case one can do this without resulting to costlySVD computations [30, 10].

4.2. Unknown and Equal Focal Length and RadialDistortion Parameter (4-point)

Assuming the one-dimensional division model (7), the(modified) epipolar constraint is given by

f(x′i, λ)TK−1[t]×RK−1f(xi, λ) = 0, (9)

for a pair of corresponding xi ↔ x′i, where K−1 =diag(1, 1, w), with w 6= 0, and λ is the unknown distor-tion parameter. As in the previous case, we may utilize thefact that (9) is linear in t, hence

M(w, λ)t = 0, (10)

where M ∈ R4×3. This can be viewed as seeking thenon-trivial nullspace of M , which, in turn, implies that all3 × 3 subdeterminants of M must vanish. Since there arefour such subdeterminants, we can reduce the problem tofour polynomial equations in two unknowns, w and λ. Fur-thermore, we need to exclude non-physical solutions corre-sponding to w = 0, as it turns out that there are infinitelymany solutions if we allow these. This can be accomplishedby saturating the corresponding ideal and can be done usingthe automatic Grobner basis generator proposed in [24]. Itturns out that the remaining system has eleven solutions ingeneral; however, in practice, the most common case is that4–6 solutions are real-valued. By using the action matrixmethod [3], we are able to construct an elimination tem-plate of size 10× 21, by using the basis heuristic proposedin [26].

4

0.1◦ 0.2◦ 0.3◦ 0.4◦ 0.5◦10−6

10−5

10−4

10−3

10−2

10−1

1

IMU error

Fund

amen

talm

atri

xer

ror

0.1◦ 0.2◦ 0.3◦ 0.4◦ 0.5◦10−5

10−4

10−3

10−2

10−1

1

10

IMU error

Foca

llen

gth

erro

r

Our (3 pt) Our (4 pt) Ding et al. [5] Kukelova et al. [22] Jiang et al. [13]

Figure 4. Error for various IMU noise levels. A total of 1,000 randomly generated problem instances are used per noise level.

5. Time Complexity

To compare timings in a fair and accurate way all solversare implemented in C++ using the Eigen [12] library, withthe same compilation flags and setup. The only excep-tion is that the original solver by Jiang et al. [13] is in na-tive MATLAB, hence significantly slower. To make a faircomparison we use the slightly faster (but numerically un-stable) solver proposed in [23] which is available in C++.Note, therefore, that the real execution time for the origi-nal method would be even larger than reported. The fastersolver is only used for timing, and the original solver is usedfor all other experiments.

To simulate a realistic environment we use a RaspberryPi 4 to measure the execution time, as it is a fair approxi-mation of hardware you can expect on an embedded devicerunning these algorithms. The results are shown in Table 1.

Table 1. Mean execution time on a Raspberry Pi 4 for 10,000 ran-domly generated problem instances in C++. We also show thenumber of solutions for each problem. These will have to be eval-uated (or at least a subset) in a RANSAC-like system, hence affectthe total execution time.

Author Time (µs) No. SolutionsOur (3-point) 6 4Ding et al. [5] 4815† 20Kukelova et al. [22] 363 15Our (4-point) 1290 11Jiang et al. [13] 1, 260, 700‡ 68

†: C++ implementation received from the authors of [5].‡: Based on the smaller template reported in [23] which is numericallyunstable. The original solver would be even slower.

Comparing the proposed 3-point solver to the state-of-the-art solver by Ding et al. [5]—essentially solving thesame problem, with the exception that we ignore the po-

tential IMU drift—our solver is more than 800× faster. Inaddition, the proposed 4-point solver including focal lengthand radial distortion correction is a factor 3.7× faster thanthe solver by Ding et al. [5], and significantly faster than thesolver by [13]; in fact, it is roughly 1000× faster, bridgingthe gap from what was considered a theoretically interestingcase to something that can be applied in practice.

Let us emphasize the practical implications of simultane-ously estimating the distortion parameter: the added intrin-sic calibration liberates the user from time-consuming cali-bration procedures. This allows UAV operators (and thoseof other visual-inertial systems) to change optics out in thefield, with no intermediate setup procedures or specific re-quirements needed.

6. Synthetic Experiments

In this section, we test the numerical stability andnoise sensitivity of our proposed methods compared tothe current state-of-the-art. The competing methodsare the 4-point solver by Ding et al. [5], the 6-pointsolver by Kukelova et al. [22] and the 7-point solver byJiang et al. [13]. We found that the solver proposed in [23],which was reported to have a smaller template size, wasnumerically unstable; hence we use the original method in-stead.

In order to get a realistic setup, random synthetic scenepoints with a positive depth in front of the cameras weregenerated. Specifically, the scene points (X, Y, Z) wereuniformly distributed with X,Z ∈ [−3, 3], the depth Y ∈[3, 8], and focal length f ∈ [300, 3000]. This mimics thesetup used in [5]. Furthermore, the orientations are ran-dom and facing the scene. The image points are then ob-tained by projecting the scene points through the cameras,and the orientations of the cameras are used as input tothe visual-inertial solvers. To increase numerical stability,

5

Basement Carpet Indoor Outdoor

Bicycle lane Building Corridor Office

Figure 5. (Top row): Example images from the dataset [40]. (Bottom row): Images from the new dataset using the Crazyflie 2.0. Note thatall images suffer from radial distortion to some extent.

the image point correspondences were normalized; this wasdone in the same way for all solvers. The error distributionfor noise-free data is shown in Figure 3. Here, all meth-ods perform well, with a slight advantage for our 3-pointmethod. Note that we include radial distortion for the pro-posed 4-point solver and the 7-point solver in this case aswell; however, the error histograms are similar and there-fore omitted.

We proceed by analyzing noise sensitivity in various sit-uations. We test all methods on synthetic data without ra-dial distortion (including the 4-point and 7-point solvers),and add a pixel noise relative to focal length (Gaussiannoise with zero mean and standard deviation 1080/f pix-els, where f is the ground truth focal length). Furthermore,we add noise to the IMU measurements—here we add noiseon all angles, including the yaw angle. The noise interval ismeant to cover the precision of a low-cost IMU, with a max-imal error of approximately 0.5◦, as reported in [19]. Theresults are presented in Figure 4. Note that both our meth-ods perform better than the competing methods in terms offundamental matrix recovery for smaller noise levels, andthe state-of-the-art method [5] only performs slightly betterfor larger noise levels. This is primarily since it is capableof correcting for the error about the gravity direction. Wehave, however, found empirically on real-data that the lowernoise levels are dominating the input data under certain con-ditions. This holds true even for low-cost devices, hence donot pose a practical issue.

7. Real DataTo demonstrate the applicability of our assumption, and

the solvers based on it, we have used the challengingdataset [40] consisting of various indoor and outdoor sceneswith predominantly planar surfaces. The scenes were cap-

tured using a mid-sized UAV (500 g, 170 × 240 × 40mm) equipped with a monochrome global shutter camera(OV9281) recorded with resolution 640 × 400, equippedwith an onboard IMU of model MPU-9250.

In addition, we have recorded a new dataset with a differ-ent UAV. The scenes are typically demanding, e.g. an indoorsequence containing forward motion in a corridor, whichis known to be hard. We show example images from thescenes in Figure 5. This dataset uses a lightweight (27 g,92× 92× 29 mm) and low-cost nano quadcopter availableunder the name Crazyflie 2.0, captured in 640 × 480 reso-lution with an RGB camera (RunCam Nano). The groundtruth was obtained using a complete SLAM system, wherethe reprojection error and IMU error were jointly minimizedover both camera parameters and scene points, to create aglobally consistent solution in metric scale. In Figure 1 weshow an image of the Crazyflie 2.0, as well as the outputfrom the proposed 4-point algorithm.

The main reason to use different UAVs with differentcomponents is to show the versatility and robustness of theproposed solvers, as different setups might perform differ-ently depending on which IMU filtering technique or pre-integration method is used. Furthermore, there are caseswhen the distortion profile of the camera optics may notbe accurately approximated by the one-parameter divisionmodel.

In the experiments we use a simple pre-integration tech-nique [7] to obtain the estimated relative rotation. As dis-cussed in Section 3, the direction of the gravity may drift,hence the 4-point methods by Ding et al. [5] will not per-form optimally, as one would have to select a filtering tech-nique that preserves the gravity direction. To remedy thissituation, we use the ground truth poses to align the direc-tion for this method every frame, and apply the estimated

6

Figure 6. (Left:) The mid-sized UAV used in the dataset [40] and one of the input images from the Indoor sequence. (Right:) The output ofthe proposed 4-point algorithm, where the distortion parameter was estimated using histogram voting. Note that the checkerboard patternon the floor is a quadrilateral box in real-life; however, it is significantly distorted (red) in the input image. The rectified image, on the otherhand, displays a quadrilateral box (green). This indicates that lines are mapped to lines and that the pinhole camera model is applicable.This is strong evidence that the distortion profile has been accurately estimated, and that the one-parameter division model (7) is sufficientfor the optics used on the UAV.

relative orientation. Note that this gives the exact same rel-ative error per input and frame as for the other solvers, butguarantees that the input to the competing 4-point solver isaligned with the gravity direction. Although these circum-stances are not plausible in real-life using pre-integration,this assures that the result is not skewed by the choice ofIMU filtering technique. The reported statistics for the 4-point method [5], however, are optimistic, as as the first in-put orientation is perfectly aligned in the experiments. Inreal-life situations they might perform worse.

Another important aspect to note is that we use con-secutive frames, as we want to minimize potential drift.This means that the translation between frames might notbe very long—typically not more than 30 cm, causing thebaseline to be short. In [5] the authors only used frames{{1, 11}, {2, 12}, . . .} to avoid this situation; however, wewant to utilize this data, as it is important for real-life appli-cations concerning UAV positioning, e.g. moving obstacleavoidance.

We use the following error metrics to measure the qualityof the pose reconstruction

eR = arccos

(tr(RGTR

Test)− 1

2

), (11)

et = arccos

(tTGTtest

‖tGT‖‖test‖

), (12)

ef =|fGT − fest|

fGT, (13)

which have been used in a number of works [36, 4, 5, 40,39]. The distortion profile is not as simple to measure, andwe will discuss this in Section 7.2.

7.1. Pose Estimation with Rectified Images

In this section, we use the rectified images for the meth-ods that do not compensate for radial distortion (the pro-posed 3-point solver, the 4-point solver [5] and the 6-pointsolver [22]), while the proposed 4-point solver is giventhe distorted images as input. We do not use the 7-pointsolver [13] in this comparison, since it is not a feasiblecompetitor in real situations due to its computational com-plexity. Each method is given 1,000 RANSAC iterationswith the same reprojection threshold, and no extra local op-timization techniques are applied. The results are shownin Table 2.

From the result, we note that there is a slight advantagein terms of accuracy in favor of the method by Ding et al. [5]compared to the proposed 3-point solver for the datasetfrom [40] generated with the mid-sized UAV; however, theroles are reversed when looking at the new dataset with theCrazyflie 2.0 UAV. Note that the rotation error for the pro-posed 3-point (and 4-point, since they are identical) are gen-erally larger for the mid-sized UAV (in the range 0.4–0.6 de-grees) whereas the error input to the Crazyflie 2.0 is slightlysmaller (roughly 0.2 degrees on average). This couldbe explained by the internal calibration between the IMUand the camera being more accurate on the Crazyflie 2.0.Furthermore, we would like to emphasize that the differ-

7

Table 2. Pose estimation error for the two datasets with rectified input images to all but the proposed 4-point method. Note that the methodby [5] is given the first ground truth rotation matrix as input to ensure the assumed alignment with the y-axis. Therefore, it is likely toperform worse in real-life.

Basement Carpet Indoor OutdoorOur (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt)

Mean 0.435 0.277 1.336 0.435 0.628 0.455 5.315 0.628 0.607 0.329 5.582 0.607 0.423 0.325 4.391 0.423Rot. Error (deg.) Median 0.378 0.233 0.765 0.378 0.566 0.345 2.197 0.566 0.496 0.308 2.169 0.496 0.345 0.284 1.090 0.345Mean 5.236 4.525 11.649 6.127 3.296 3.129 11.450 4.871 4.889 4.179 20.648 6.093 5.345 7.138 25.872 6.446Trans. Error (deg.) Median 3.755 2.741 7.457 4.458 2.243 1.732 8.414 2.929 2.689 2.286 14.811 2.722 3.576 4.002 17.090 4.499Mean 24.804 20.977 72.814 27.315 6.921 6.462 63.394 46.546 7.595 9.011 69.189 16.190 12.914 27.351 88.031 39.547Focal. Error (perc.) Median 6.336 4.887 18.681 12.293 3.597 2.526 23.606 7.761 4.582 3.322 39.160 7.546 7.223 8.923 39.310 9.584

Bicycle lane Building Corridor OfficeOur (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt)


Table 3. Pose estimation error with unrectified input images to all methods. The proposed 4-point method is the only method capable ofcorrecting for radial distortion artifacts.

Basement Carpet Indoor OutdoorOur (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt)


Bicycle lane Building Corridor OfficeOur (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt) Our (3 pt) Ding Kukelova Our (4 pt)


ence between the proposed 3-point and competing 4-pointmethod [5] is not very large even for the mid-sized UAVand that in a real scenario one might want to choose theproposed method, as it is 800× faster. Another interestingaspect is that the visual-only method by [22] is performingsignificantly worse than the competing methods, which wasalso noted in [5].

7.2. Pose Estimation with Raw Images

We now turn our attention to using distorted input im-ages. This scenario is interesting for UAV operators whowish to change optics out in the field without intermedi-ate calibration procedures. The same input sequences asin Section 7.1 are used; however, they are not rectified priorto estimating the image point correspondences. The resultsare shown in Table 3. Unsurprisingly, our 4-point methodoutperforms the other methods that cannot correct for dis-tortion artifacts. What is perhaps more interesting is thatthe performance, in general, is better than the visual-only6-point method [22] on rectified data. These observationssuggest that the radial distortion auto-calibration approachis practically feasible using the proposed solver.

As optics, in general, are not perfectly approximated bythe one-parameter division model, it is non-trivial to expressthe performance of the radial distortion correction. Instead,we rely on an ocular inspection of the estimated radial dis-tortion parameter for two sequences. In Figure 1 and Fig-

ure 6 we show the rectifications, using the estimated radialdistortion parameter obtained from histogram voting of therespective sequence. In the latter case, we get a clear vi-sual confirmation of the successful estimation of the radialdistortion parameter, in the form of a quadrilateral checker-board pattern visible on the floor.

8. ConclusionsIn this paper, we have investigated an assumption of ig-

noring the IMU drift for short time intervals. We showedthat modern pre-integration methods perform well and thatthe relative pose problem can be solved accurately and satis-factorily using this assumption. What is most important, isthat the resulting equations are significantly easier to solve,opening up the possibility to tackle problems that werepreviously considered extremely hard and not suitable forreal-time applications. We proposed the first-ever minimalsolver for simultaneously estimating the focal length, dis-tortion profile, and motion parameters while incorporatingthe IMU data. Furthermore, we showed a speed-up of 800×compared to the current state-of-the-art for the partially cal-ibrated case with unknown and equal focal length, with littleto no loss in accuracy. The methods have been thoroughlytested on different UAVs with different components, in sev-eral challenging indoor and outdoor environments, demon-strating excellent performance.

8

References[1] Snehal Bhayani, Zuzana Kukelova, and Janne Heikkila. A

sparse resultant based method for efficient minimal solvers.In Proceedings of the IEEE/CVF Conference on ComputerVision and Pattern Recognition (CVPR), June 2020. 1

[2] C. Campos, J. M. M. Montiel, and J. D. Tardos. Inertial-only optimization for visual-inertial initialization. In 2020IEEE International Conference on Robotics and Automation(ICRA), pages 51–57, 2020. 4

[3] D. A. Cox, J. Little, and D. O’Shea. Using Algebraic Geom-etry. Graduate Texts in Mathematics. Springer New York,2005. 1, 4

[4] Yaqing Ding, Jian Yang, Jean Ponce, and Hui Kong. An ef-ficient solution to the homography-based relative pose prob-lem with a common reference direction. In The IEEE Inter-national Conference on Computer Vision (ICCV), October2019. 3, 7

[5] Yaqing Ding, Jian Yang, Jean Ponce, and Hui Kong. Minimalsolutions to relative pose estimation from two views sharinga common direction with unknown focal length. In Proceed-ings of the IEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR), June 2020. 2, 3, 4, 5, 6, 7, 8

[6] A. W. Fitzgibbon. Simultaneous linear estimation of multipleview geometry and lens distortion. In Conference on Com-puter Vision and Pattern Recognition (CVPR), Dec 2001. 3

[7] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza. On-manifold preintegration for real-time visual–inertial odome-try. IEEE Transactions on Robotics, 33(1):1–21, 2017. 3,6

[8] Friedrich Fraundorfer, Petri Tanskanen, and Marc Pollefeys.A minimal case solution to the calibrated relative pose prob-lem for the case of two known orientation angles. In Euro-pean Conference on Computer Vision (ECCV), pages 269–282. Springer Berlin Heidelberg, 2010. 2, 3

[9] Johan Fredriksson, Viktor Larsson, Carl Olsson, and FredrikKahl. Optimal relative pose with unknown correspondences.In IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR), pages 1728–1736, 2016. 3

[10] Theodore F. Gast, C. Fu, Chenfanfu Jiang, and J. Teran.Implicit-shifted symmetric qr singular value decompositionof 3x3 matrices. Technical report, University of California,2016. 4

[11] Banglei Guan, Ji Zhao, Zhang Li, Fang Sun, and FriedrichFraundorfer. Minimal solutions for relative pose with a sin-gle affine correspondence. In Proceedings of the IEEE/CVFConference on Computer Vision and Pattern Recognition(CVPR), June 2020. 2, 3

[12] Gael Guennebaud, Benoıt Jacob, et al. Eigen v3.http://eigen.tuxfamily.org, 2010. 5

[13] Fangyuan Jiang, Yubin Kuang, Jan Erik Solem, and KalleAstrom. A minimal solution to relative pose with unknownfocal length and radial distortion. In Asian Conference onComputer Vision (ACCV), pages 443–456, 2014. 2, 3, 4, 5, 7

[14] J. Kaiser, A. Martinelli, F. Fontana, and D. Scaramuzza. Si-multaneous state initialization and gyroscope bias calibrationin visual inertial aided navigation. IEEE Robotics and Au-tomation Letters, 2(1):18–25, 2017. 4

[15] Joe Kileel, Zuzana Kukelova, Tomas Pajdla, and BerndSturmfels. Distortion varieties. Foundations of Computa-tional Mathematics, 18(4):1043–1071, 2018. 3

[16] E. Kraft. A quaternion-based unscented kalman filter for ori-entation tracking. In Sixth International Conference of Infor-mation Fusion, 2003. Proceedings of the, volume 1, pages47–54, 2003. 3

[17] Y. Kuang, J. E. Solem, F. Kahl, and K. Astrom. MinimalSolvers for Relative Pose with a Single Unknown Radial Dis-tortion. In 2014 IEEE Conference on Computer Vision andPattern Recognition, pages 33–40, 6 2014. 2, 3

[18] Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigen-value solutions to the 5-pt and 6-pt relative pose problems.In British Machine Vision Conference (BMVC), 01 2008. 1

[19] Zuzana Kukelova, Martin Bujnak, and Tomas Pajdla.Closed-Form Solutions to Minimal Absolute Pose Problemswith Known Vertical Direction. In Asian Conference onComputer Vision (ACCV), Queenstown, New Zealand, 112010. 6

[20] Z. Kukelova, M. Bujnak, and T. Pajdla. Polynomial eigen-value solutions to minimal problems in computer vision.IEEE Transactions on Pattern Analysis and Machine Intel-ligence, 34(7):1381–1393, 2012. 1

[21] Z. Kukelova, J. Heller, M. Bujnak, and T. Pajdla. Radialdistortion homography. In Conference on Computer Visionand Pattern Recognition (CVPR), pages 639–647, June 2015.3

[22] Zuzana Kukelova, Joe Kileel, Bernd Sturmfels, and TomasPajdla. A clever elimination strategy for efficient minimalsolvers. Conference on Computer Vision and Pattern Recog-nition (CVPR), pages 3605–3614, 2017. 2, 3, 4, 5, 7, 8

[23] V. Larsson, K. Astrom, and M. Oskarsson. Efficient solversfor minimal problems by syzygy-based reduction. ComputerVision and Pattern Recognition (CVPR), pages 2383–2392,July 2017. 1, 3, 5

[24] V. Larsson, K. Astrom, and M. Oskarsson. Polynomialsolvers for saturated ideals. International Conference onComputer Vision (ICCV), pages 2307–2316, October 2017.4

[25] Viktor Larsson, Zuzana Kukelova, and Yinqiang Zheng.Camera pose estimation with unknown principal point. InProceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR), June 2018. 2, 3

[26] V. Larsson, M. Oskarsson, K. Astrom, A. Wallis, Z.Kukelova, and T. Pajdla. Beyond grobner bases: Basis se-lection for minimal solvers. Computer Vision and PatternRecognition (CVPR), pages 3945–3954, 2018. 4

[27] B. Li, L. Heng, G. H. Lee, and M. Pollefeys. A 4-point algo-rithm for relative pose estimation of a calibrated camera witha known relative rotation angle. In 2013 IEEE/RSJ Interna-tional Conference on Intelligent Robots and Systems, pages1595–1601, 2013. 2, 3

[28] S. O. H. Madgwick, A. J. L. Harrison, and R. Vaidyanathan.Estimation of imu and marg orientation using a gradient de-scent algorithm. In 2011 IEEE International Conference onRehabilitation Robotics, pages 1–7, 2011. 3

9

[29] Agostino Martinelli. Closed-form solution of visual-inertialstructure from motion. International Journal of ComputerVision, 106(2):138–152, Jan 2014. 4

[30] Aleka McAdams, A. Selle, Rasmus Tamstorf, J. Teran, andEftychios Sifakis. Computing the singular value decomposi-tion of 3x3 matrices with minimal branching and elementaryfloating point operations. Technical Report 1690, Universityof Wisconsin, 2011. 4

[31] R. Mur-Artal and J. D. Tardos. Visual-inertial monocularslam with map reuse. IEEE Robotics and Automation Let-ters, 2(2):796–803, 2017. 4

[32] O. Naroditsky, X. S. Zhou, J. Gallier, S. I. Roumeliotis, andK. Daniilidis. Two efficient solutions for visual odometryusing directional correspondence. IEEE Transactions onPattern Analysis and Machine Intelligence, 34(4):818–824,2012. 2, 3

[33] James Pritts, Zuzana Kukelova, Viktor Larsson, and OndrejChum. Radially-distorted conjugate translations. In Confer-ence on Computer Vision and Pattern Recognition (CVPR),2018. 3

[34] James Pritts, Zuzana Kukelova, Viktor Larsson, and OndrejChum. Rectification from radially-distorted scales. In AsianConference of Computer Vision (ACCV), pages 36–52, 2018.3

[35] A. M. Sabatini. Quaternion-based extended kalman filterfor determining orientation by inertial and magnetic sensing.IEEE Transactions on Biomedical Engineering, 53(7):1346–1356, 2006. 3

[36] O. Saurer, P. Vasseur, R. Boutteau, C. Demonceaux, M.Pollefeys, and F. Fraundorfer. Homography based egomotionestimation with a common direction. IEEE Transactions onPattern Analysis and Machine Intelligence, 39(2):327–341,Feb 2017. 7

[37] C. Sweeney, J. Flynn, and M. Turk. Solving for relativepose with a partially known rotation is a quadratic eigenvalueproblem. In International Conference on 3D Vision (3DV),volume 1, pages 483–490, 2014. 2, 3

[38] Roberto Valenti, Ivan Dryanovski, and Jizhong Xiao. Keep-ing a good attitude: A quaternion-based orientation filter forimus and margs. Sensors, 15:19302–19330, 08 2015. 3

[39] Marcus Valtonen Ornhag, Patrik Persson, MartenWadenback, Kalle Astrom, and Anders Heyden. Effi-cient real-time radial distortion correction for uavs. InProceedings of the IEEE/CVF Winter Conference on Ap-plications of Computer Vision (WACV), pages 1751–1760,January 2021. 3, 7

[40] Marcus Valtonen Ornhag, Patrik Persson, MartenWadenback, Kalle Astrom, and Anders Heyden. Mini-mal solvers for indoor uav positioning. In Proceedings of theInternational Conference on Pattern Recognition (ICPR),pages 1136–1143, January 2021. 3, 6, 7

[41] Changchang Wu. P3.5p: Pose estimation with unknown fo-cal length. In Proceedings of the IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), June 2015. 2

10

trust your imu: consequences of ignoring the imu drift

Documents