calibration of rgb camera with velodyne lidar · methods of the automatic camera calibration with...

Calibration of RGB Camera With Velodyne LiDAR

Martin Velas, Michal Spanel, Zdenek Materna, Adam HeroutDepartment of Computer Graphics and Multimedia

Faculty of Information Technology, Brno University of TechnologyBozetechova 2

612 00, Brno, Czech Republic{ivelas, spanel, imaterna, herout}@fit.vutbr.cz

ABSTRACTCalibration of the LiDAR sensor with RGB camera finds its usage in many application fields from enhancingimage classification to the environment perception and mapping. This paper presents a pipeline for mutual poseand orientation estimation of the mentioned sensors using a coarse to fine approach. Previously published methodsuse multiple views of a known chessboard marker for computing the calibration parameters, or they are limited tothe calibration of the sensors with a small mutual displacement only. Our approach presents a novel 3D marker forcoarse calibration which can be robustly detected in both the camera image and the LiDAR scan. It also requiresonly a single pair of camera-LiDAR frames for estimating large sensors displacement. Consequent refinement stepsearches for more accurate calibration in small subspace of calibration parameters. The paper also presents a novelway for evaluation of the calibration precision using projection error.

Keywordscamera, LiDAR, Velodyne, calibration, marker

1 INTRODUCTIONThis paper deals with an automatic calibration of RGBcamera with Velodyne LiDAR (Light Detection AndRanging) sensor (also called laser radar or scanner).The Velodyne sensor scans the whole area around therotating beam using 32 or 64 laser rays and finds itsuse in many applications such as autonomously driv-ing Google car. Using techniques proposed in the fol-lowing chapters, the extrinsic parameters – position andorientation of the LiDAR related to the camera – canbe computed as is shown on Figure 2. This problem iscommonly named as searching for 6 degrees of freedom(6DoF) - rotation and translation against all the threeaxes of 3D space.When solving such calibration problem, the most chal-lenging tasks are: the ability to deal with large sensordisplacement and minimal requirements for a specificscene setup while preserving accuracy of the resultingcalibration.In order to fulfill these goals, the proposed camera-LiDAR calibration pipeline performs the calibration intwo consequent steps. First, a large sensor displace-

Permission to make digital or hard copies of all or part ofthis work for personal or classroom use is granted withoutfee provided that copies are not made or distributed for profitor commercial advantage and that copies bear this notice andthe full citation on the first page. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee.

(a) (b)

(c)

Figure 1: Using the calibration for fusion of the cameraimage (a) and Velodyne scan (b) obtaining a colouredpoint cloud (c).

ment is estimated using a special, but still simple, 3Dmarker and then a refinement is applied in order to ob-tain a more accurate calibration. Using the proposedprojection error, it will be proved that the coarse cali-bration reaches similar or even better accuracy than themanual one (the operator sets and visually verifies the

calibration parameters) in the task of sensor displace-ment computation. The consequent refinement processfurther improves the calibration accuracy and exhibits5% better results than the previously proposed methodby Levinson and Thrun [LT13].

Fusion of the aligned camera with the LiDAR sensorwas recently used in many tasks of computer visionin order to enhance their performance. Douillard etal. [DBR09] use a 3D classifier based on the fusedVelodyne data with the monocular color imagery formodelling semantic content in scenes. Colour pointclouds obtained through the camera-LiDAR fusionproved to be useful in velocity estimation for precisevehicle tracking and autonomous driving [HLT13].Figure 1 demonstrates results of the Velodyne pointcloud colouring. A combination of the aligned LiDARwith the camera was also successfully used in theprocess of building 3D geological maps [NMV10] orfor preserving cultural heritage by the 3D modelling ofhistorical buildings [BJCK11].

The method described in this work extends the solutionproposed by Levinson and Thrun [LT13]. The biggestasset of the mentioned paper is the proved assumptionthat the edges can be robustly detected in both the cam-era image and the LiDAR scan.

The novelty of our method is the two-level calibrationscheme dividing the problem into two consequent parts.The large displacements in sensors position are com-puted using the camera-LiDAR correspondences foundvia a fully automatic 3D marker detection. The novelmarker also allows to perform the coarse calibrationfrom a single camera image and the corresponding Li-DAR scan what is an advantage over traditional "multi-frame" techniques. Finally, the obtained calibration isrefined using a simple exhausting dense search in asmall subspace of the calibration parameters. Both themarker detection and the refinement are based on therobust edge detection proposed by Levinson and Thrun[LT13].

In its experimental part, the paper also introduces a newway how the calibration methods can be evaluated au-tomatically using an image segmentation. The processassumes that in the calibration scene foreground and thebackground can be easily separated both in the cameraimage and the LiDAR point cloud. As the error met-ric, the ratio of the correctly projected 3D points ontoproper image segments is used.

Using this metric will be proved that our modificationof the refinement process reaches 4% lower miscalibra-tion error than the basic solution.

2 RELATED WORKMethods of the automatic camera calibration with theLiDAR sensor can be divided into several groups. The

Figure 2: Problem of a camera – Velodyne calibration.The solution of the problem are the vectors t and R de-scribing the translation and the rotation of Velodyne Li-DAR related to the RGB camera.

first group of methods [KP10, PMSE10, GMCS12] re-quires a chessboard-like marker for an automatic cal-ibration. Other methods [TN13, LT13] assume an ar-bitrary markerless scene for automatic calibration buttheir capability of calibration is reduced just to a refine-ment of small differences in position and orientationbetween sensors. Some of the recently used methods[BCK13, LN09, PPRJ13] use a sequence of capturedframes and compute the calibration parameters from themotion obtained through the feature tracking or specialIMU units.

An automatic alignment of the camera with 2D laserrangefinder where typical checkerboard marker must beobserved from multiple views was proposed by Zhangand Pless [ZP04]. Constraints based on this observa-tions are solved minimizing an algebraic error and fol-lowed by a nonlinear process. The external calibrationof a 3D laser scanner with an omnidirectional camerasystem [PMSE10] requires also a planar checkerboardpattern to be observed simultaneously from the laserscanner and the camera system from a minimum of 3views. Normals of the planar surface form a non-linearoptimization problem that is solved for the extrinsic cal-ibration parameters. All the mentioned techniques suf-fer from the necessity of the presence of some knownmarker in the scene. Moreover the marker must be ob-served from multiple points of view. A toolbox forthe automatic camera-Velodyne calibration presentedby Geiger et al. [GMCS12] recovers intrinsic and ex-trinsic camera parameters using a single view of mul-tiple chessboard instances placed in the scene what isstill quite impractical.

A structure-from-motion technique [BCK13] forcamera-laser fusion which uses a 2D laser sensor andmultiple cameras, was also presented. The laser pointsare projected onto the images and tracked using KLTalgorithm [ST94] to other frames in order to be usedas 3D-2D correspondences. The paper by Núñez et al.

[PPRJ13] proposes a method to perform the extrinsiccalibration between a camera and a 3D-LRF observinga checkerboard pattern with the aid of motion obtainedby Inertial Measurement Unit (IMU). The limitation ofthese methods is the need for multiple instances andthis special type of hardware.

A “golden” calibration method which would require nomarker, single observation of the scene, no further HWrequirements and still provides sufficient results is stillmissing. Recently, techniques which work with mark-erless arbitrary scenes assuming some shared propertyby both the camera image and the LiDAR scan ap-pears. Method [TN13] is based on a hypothesis thatthe horizontally oriented planes are displayed on thecamera image with higher intensity. The searching forthe proper calibration parameters is performed as a par-ticle swarm optimization [KE95] with the cost func-tion based on mutual information between the imageand projected point cloud also used by Pandey et al.[PMSE12]. However, this techniques come with another limitation which is the need for a small displace-ment of the sensors in both their orientation and posi-tion.

A real time miscalibration detection and the transfor-mation adjustment introduced by Levinson and Thrun[LT13] enables online and automatic camera-lasercalibration in arbitrary environments. In the mentionedwork, the miscalibration detection is performed byprobabilistic monitoring. Consequently, a continuouscalibration optimizer adjusts transform offsets ifmiscalibration was previously detected. This solutionis based on the assumption that edges can be robustlyand reliably detected in both the camera image and theLiDAR scan which proved to be right. The objectivefunction based on the matching edges detected in bothmodalities and which is used for the optimization isconvex only in a small subspace around the globaloptimum. Thus the calibration task is limited to onlysmall variations in the position and the orientationbetween the camera and LiDAR.

The main drawback of the two previously mentionedmethods [LT13, TN13] – the limitation of the positionand the orientation between sensors just to a small sub-space of 6DoF parameter space – is relaxed in this pa-per using a special type of 3D marker which allowscoarse to fine calibration of the sensors using just a sin-gle frame captured by the camera and LiDAR from asingle point of view and a single instance of the markerin the scene. Our method of marker detection is basedon the proved assumption [LT13] that edges can be ro-bustly detected in both the camera image and the Li-DAR point cloud. The final step of the sensor alignmentproposed in this paper is the calibration refinement bymeans of searching in a small subspace of calibrationparameters.

3 CALIBRATION OF CAMERA WITHLIDAR

Our proposed solution consists of two consequent steps.First, an approximate coarse calibration is estimated. Inthis step, we assume that the translation of the laser sen-sor against the camera is much more significant than therotation. For the coarse calibration step, a special "3Dmarker" which can be easily found in both the cameraand the Velodyne data was designed. Found correspon-dences are further used for deterministic computationof the translation between the camera and Velodyne.

In the next step, the calibration of sensors including therotation is refined.. This fine calibration is based onthe solution proposed by Levinson and Thrun [LT13]with further simplifications which brings computationspeed-up and higher reliability of obtained results.

To be more specific, the coarse to fine calibrationpipeline works in the following steps:

1. Coarse calibration

(a) edge detection in the Velodyne point cloud andthe camera image,

(b) 3D marker detection (circles’ centres and the ra-dius),

i. in the camera image, using the Hough trans-form [YPIK90],

ii. in the point cloud, using our detection algo-rithm based on RANSAC [FB81] which is de-scribed below in detail.

(c) Estimation of the translation between both thesensors in the 3D space.

2. Calibration refinement

(a) initialization using estimated coarse calibrationparameters,

(b) dense search in the small subspace of the cali-bration parameters,

i. projection of the point cloud with detectededges to the image plane

ii. edge detection and the inverse distance trans-form of the camera image

iii. cross correlation of these two images

(c) pick the calibration solution based on the men-tioned cross correlation criteria

Sensors DescriptionCalibration techniques presented in this work were de-veloped and tested with sensors mounted on our roboticplatform shown on Figure 3a.

(a) (b)

Figure 3: Our robotic platform (a) and an example ofthe Velodyne scan (b).

The source of the camera image is the Kinect c©1 sensormounted on our robot. Besides the RGB frames it pro-vides a depth map which is not used in the calibrationprocess.

Source of the laser radar scan is the Velodyne c©2 Li-DAR sensor – the HDL-32 model with 32 lasers hav-ing field of view 360◦ horizontally and 40◦ vertically.The sensor rotates 10 times per second and captures700000 points per second which are organized into 32rings (one per each laser). An example of the scan pro-vided by this sensor is displayed on Figure 3b.

The robot runs ROS Hydro for unificated control andthe access to sensoric data. We assume that the sensorsthemselves are already calibrated and the ROS topicinterfaces provides auxiliary data as the intrinsic cam-era parameters (focal length and the principal point) inform of the projection matrix.

Coarse CalibrationThe initial step of the calibration estimates the coarsecalibration parameters using the novel 3D marker. Thisestimation allows a large position displacement of thesensors and it is further refined in the consequent step.

Marker DescriptionUsage of a marker for the calibration purposes isnothing new. A typical example of marker usedfor calibration is chessboard pattern which was alsosuccessfully used for the camera-LiDAR calibration[GMCS12, PMSE10, KP10]. The biggest drawback ofthis marker is the need for multiple observations of themarker from different view points.

The design of our marker undergoes the condition thatit has to be easily detected in both the camera imageand the Velodyne data and only one observation of themarker is sufficient for estimation of the calibration pa-rameters. The pre-processing of the camera and the Li-DAR data includes an edge detection which will be de-scribed in the next sections.

1 http://www.xbox.com/en-US/kinect2 http://velodyneLiDAR.com/

Figure 4b shows how edges of various geometricalshapes are captured using the Velodyne LiDAR. Onecan observe that the vertical position of shapes withhorizontally oriented edges can not be determined pre-cisely with Velodyne-like sensor which scans the sur-rounding environment in horizontal rings.

(a) (b)

(c) (d)

Figure 4: Planar marker with holes of different geomet-rical shapes (a) and edges detected in its LiDAR scan(b). Our proposed marker (c) and the detected edges(d).

Figure 4d shows the final version of our plannar markerwith circular holes in it. Thus, both the horizontal andthe vertical position of the circles can be clearly deter-mined visually and automatically.

Detection in Velodyne Data

Edge detection in Velodyne data is based on the as-sumption that depth discontinuities in laser data repre-sent edges in the real scene [LT13, BCES11].

The frame of Velodyne LiDAR is processed through therings as it is natural way for this type of the sensor. It ispossible to assign a value to each measured point whichdepends on its depth measurement and the depth of itsdirect neighbours:

Xi = max(Pri−1−Pr

i ,Pri −Pr

i+1,0)γ (1)

where Pri represents the range of the i-th point of the

point-clout gained from Velodyne and γ is the constant(γ = 0.5 was used in our experiments). The lower γ

is, the higher value will be assigned to the points oflow distortion relatively to the points with high distor-tion. Afterwards, the Xi values are normalized into therange (0;1).

For efficiency, the points with assigned value undera certain threshold (0.1) are discarded. This leads

to reduction of the number of points (about 90% ofpoints are removed) while preserving the detection ac-curacy. The threshold value was derived from the workof Levinson and Thrun [LT13]. Our experiments con-firmed that the suggested value is reasonably restrictive(values 0.01 or 0.03 might result in a loss of significantinformation while higher values 0.3 or 0.5 do not bringany detection improvements and slow down the compu-tation).

For the detection itself we assume that the size, numberand mutual position of circles in the marker are known.Another assumption is that the marker is visible for bothVelodyne and the camera and the circular holes in themarker can be considered as a planar object - an inter-section of a plane with spheres. These assumptions areused for the detection verification.

The marker detection in the Velodyne point cloud re-peats three major phases - detection, verification andpointcloud pruning:

1. The marker is roughly located as the plane detectedby the RANSAC algorithm [FB81] with the model ofthe plane. The outliers are discarded and the rest ofthe points are further processed.

2. Borders of the marker are removed using theRANSAC algorithm for the line detection in 3Dspace. These borders would cause inaccuracies inthe circles detection process because the Velodynedata are sparse and only few (15 − 20) pointsrepresent each circular hole. Figure 5a shows asituation when the borders were not discarded andthe detection failed.

(a) (b)

Figure 5: Situation when the circular holes were foundincorrectly (a) because of lack of the data and preserv-ing the vertical borders of the marker. A correct de-tection of the 4 marker holes after the marker bordersare discarded (b). Verification was done by checkingdistances of circle centres.

3. The algorithm detects 4 best candidates for themarker holes also using RANSAC with model ofthe sphere (intersection of the spheres and the planeare the circles the algorithm is looking for).

4. After detection of a given number of spheres theirmutual position is verified – distances of the centers(blue lines on Figure 5b) are checked. If verifica-tion succeeds the whole marker is detected and thealgorithm ends.

5. Otherwise the algorithm continues with a phase ofpointcloud pruning for the next iteration of the al-gorithm. The input of this step is the same set ofpoints representing the planar marker (see Step 1).As we know the mutual distance of the circle cen-tres in the marker, we can say for the each detectionwhere the 3D points of the other circles should be lo-cated – coloured areas on Figure 6a. The algorithmpreserves only the points which belong to such anarea of at least one the detections found in Step 3.If at least one of the detections was correct, this stepof the algorithm preserves all of the points belong-ing to the circle holes of the marker. The algorithmthen continues with Step 3.

(a) (b)

Figure 6: Process of the pointcloud pruning (a).Each detection (purple circle) yields the coloured ar-eas (green for nearer circles and the blue for the circlelaying on diagonal) where the points of other markercircles may be located. These points are preservedand process is repeated for each detection. Remain-ing points are discarded. The detection continues fromStep 3 using the resulting pruned point cloud (b).

The progress of the described algorithm can be seen onFigure 7 where the marker was found after two itera-tions.

Detection in the Camera ImageThe detection process of the circles in RGB camera im-age starts with the edge detection using Sobel operatorwhich produces less noisy outputs then the previouslyproposed detector by Levinson and Thrun [LT13].The circles are then detected using Hough transform forcircles [YPIK90]. This detection proved to be robustenough and no further validation is necessary.

Translation computingIn the most general case of the calibration, all six de-grees of freedom describing translation and rotation

(a) (b)

(c) (d)

Figure 7: Progress of the marker detection in the point-cloud (a). After first detection (b) the validation failsand the new pointcloud is generated (c). In the seconditerations the detection succeeds (d).

must be considered. As we assume that the differenceof the position between the camera and LiDAR is muchmore significant than the rotation itself we reduce thepotential geometrical transformation just to the transla-tion.

The calibration process transforming homogeneous co-ordinates of 3D point [X ,Y,Z,1] to the point [ x

w ,yw ] in

2D plane can be described using the following equa-tion:

xyw

= P.C.

XYZ1

(2)

where P is the known camera projection matrix and Cis the calibration matrix describing geometrical trans-formation between the sensors. Both the projection Pand the calibration matrix C reduced to three transla-tion parameters are presented in following equations:

P =

f 0 ox 00 f oy 00 0 0 1

(3)

C = T =

1 0 0 tx0 1 0 ty0 0 1 tz0 0 0 1

(4)

where f is the camera focal length and [ox,ox] are thecoordinates of the principal point in pixels – i.e. theknown intrinsic camera parameters. The only unknownvariables we need to compute are the components ofthe translation vector [tx, ty, tz]. Remember that the ro-tation between the sensors is omitted in the process ofthe coarse calibration and these degrees of freedom arecomputed during the calibration refinement.

The translation vector is estimated from the found cor-respondences of the circles (centers and radii) gainedthrough the detection of the marker in both the Velo-dyne scan and the camera image described in previoussections.

The first unknown component of the translation, whichis necessary to be solved, is the depth difference be-tween the sensors tz. Assuming the marker is planarand neglecting the influence of the perspective projec-tion (marker is approximately straight in front of therobot), it is possible to compute tz just using corre-sponding radii of circles detected in the image (r2D) andcircles detected in the point cloud (r3D). Equation (5)can be derived from Equation (2) considering the men-tioned assumptions, where Z is the depth coordinate ofthe circle center.

tz =r3D. f

r2D−Z(5)

Other components of the translation vector can beroughly estimated using Equations (6) and (7) alsoderived from (2).

tx =(x−ox).(Z + tz)

f−X (6)

ty =(y−oy).(Z + tz)

f−Y (7)

Equations (6) and (7) indicate that a single correspon-dence can be used to estimate the translation parame-ters.

In this work, the translation is estimated for each cir-cle correspondence independently and then the gainedtranslations are combined using simple averaging.

Fine CalibrationAfter the approximate coarse calibration is found usingthe 3D marker, a more precise calibration is estimatedby searching for all the 6 degrees of freedom (the trans-lation and the rotation) in a small parameter subspace.

This searching is performed using a dense sampling ofthe neighbourhood of the approximate calibration vec-tor [tx, ty, tz,0,0,0] obtained in the previous process ofthe coarse calibration where the rotation was omitted(zero initial rotation parameters in the vector).

During the search for the calibration parameters, theability to evaluate each set of parameters is essential.It means that we need a cost function f (Equation (8))which assigns a value proportional to the quality of thecalibration to each set of 6DoF parameters between thesensors.

f : R6→ R (8)

Cost Function Based on the EdgesWe adopt the error metric used by Levinson and Thrunin their online miscalibration detection system [LT13].Their error metric is based on the assumption that theedges can be robustly found in the camera image andmatched with the range discontinuities in the LiDARpoint cloud.Processing of the data during the fine calibration issimilar to the processing during the process of the 3Dmarker detection.First, the edges are found in the Velodyne point cloud.During this process, value Xi described in Equation (1)is assigned to the each point and it is considered to beintensity value of this pixel. Then the point cloud istransformed using the parameters we want to evaluate.Finally these points are projected on the image planeusing the projection matrix (Equation (3)) and can becompared against the preprocessed camera image in 2Dspace using some traditional similarity criteria.The camera image preprocessing consists of edge de-tection – in 2D case using simple Sobel operator creat-ing edge image E. In order to make the cost functionsmoother, we apply Inverse Distance Transform (IDT)[LT13] using L1 norm to the edge image (Equation 9).IDT assigns each pixel on coordinates [i, j] the valueDi, j proportional to the distance and the strength of theedge Ex,y in the pixel’s neighbor on coordinates [x,y].

Di, j =α.Ei, j+(1−α) .maxx,y{Ex,y.β

max(|x−i|,|y− j|)} (9)

The α factor increases strength of the neighbour edges’impact (0.33 was used in our experiments) and the fac-tor β enlarges the area an edge effectively impacts (0.98was used in our experiments).The output of the IDT applied on image 1a is shownon Figure 8. Application of this transform causes thatthe closer the parameters of the calibration to the idealparameters are the higher value of the cost function arereached.Finally, the cross correlation similarity criteria (Equa-tion (10)) is applied to the projected point cloud IV andprocessed camera frame IC in order to get value of thecost function.

SE = ∑x

∑y

IC(x,y)∗ IV (x,y) (10)

Figure 8: Application of the Sobel operator and IDT tothe image on Figure 1a

OptimizationPrevious works [LT13, TN13] showed that the opti-mization criteria, such as criteria based on the edge de-tection, are highly non-convex. It causes that the tra-ditional optimization techniques based on the gradientdescent/ascent or Newton’s method for the optimizationwould fail.

In our solution the refinement of the previously estab-lished coarse calibration is performed by a regular sam-pling of the small subspace of the calibration parame-ters around the point of the coarse calibration. This ideafollows the process of regular grid search proposed byLevinson and Thrun [LT13] for their miscalibration de-tection.

Each point of the calibration parameter space can beevaluated by the criteria (Equation (10)) and this valuerefers to the fitness of the calibration. The final cali-bration parameters are than chosen as the point of theparameter space with the maximal cost function or thepoint obtained by averaging of the multiple points withthe fitness value above some threshold.

4 EXPERIMENTAL RESULTSIn our experiments, real data from sensors mounted onthe robotic platform (see Figure 3a) were used. TheRGB images were captured using the Kinect sensor(43◦ vertical and 57◦ horizontal field of view), while the3D point cloud was scanned using the Velodyne HDL-32E LiDAR producing 32 horizontal scans that cover360◦ horizontally and 40◦ vertically. The depth infor-mation provided by Kinect mounted on our robot wasused neither for the calibration process nor for the ex-periments.

Miscalibration ErrorA novel criteria to experimentally evaluate quality ofthe calibration is proposed in this paper. 3D points ob-tained by the Velodyne LiDAR are projected on the im-age segment occupied by this object. The projection isperformed according to Equation (2) using projectionmatrix P (known intrinsic camera parameters) and cali-bration matrix C we try to evaluate.

This type of criteria requires a prepared scene sothe foreground in both the Velodyne pointcloud andthe camera image can be easily distinguished fromthe background using the the image segmentation.The more 3D points are projected on a wrong imagesegment, the higher error is obtained.

In this work, a dark 3D marker (Figure 4c) posi-tioned against the white wall was used to evaluatethe precision. The segmentation in the camera imagewas performed using adaptive intensity thresholding.The foreground and the background segment in theVelodyne scan was obtained using Otsu thresholding[Ots79] based on the range information of points.

The value of projection error PE is computed as thenumber of incorrectly projected points E (i.e. fore-ground points projected on the background segment andvice versa) divided by the total number of points in thepoint cloud P:

EP =EP

(11)

Coarse Calibration ResultsTo evaluate the robustness and the accuracy of proposedcalibration scheme, we collected data of a distorted 3Dmarker in front of a wall while adding additional clutter(chairs, shelves, . . . ) in the background.

Table 1 shows the ratio of correctly detected markersrotated around the X , Y and Z axis to the total num-ber of the detections performed. A marker is correctlydetected if the verification phase of detection algorithmsucceeded. For each marker pose, 5 pairs of Velodyneand RGB images were retrieved. The algorithm of themarker detection proved to be robust enough to handlethese rotations and provides reliable detections for fur-ther coarse calibration computation.

Without Our iterativeverification algorithm

X axis rotation 0.84 0.96Y axis rotation 0.72 1.00Z axis rotation 0.76 0.93

Table 1: Percentage of the marker detection (recall) us-ing only RANSAC versus our iterative algorithm withthe verification and pointcloud pruning. Marker wasrotated around the all 3 axis in range 0− 20◦ for thisevaluation.

Table 1 shows that the verification and the pruning stepof our algorithm brings the significant improvementcomparing to the basic approach using only the detec-tion of 4 best candidates for marker circular holes.

Ground truth for the captured data was manually anno-tated using our tool enabling a user to change the trans-lation between the sensors. The tool displays the 3D

points projected on the camera image as an overlay ina different color channel so the operator can visuallyverify the calibration.

Evaluation of the coarse calibration was performed us-ing the miscalibration error Ep which is computed bothfor the manual and the coarse calibration estimated us-ing the rotated marker.

10

12

14

16

18

20

0 5 10 15 20M

isca

libra

tion e

rror

[%]

Angle of marker rotation [deg]

Miscalibration error of coarse calibration

Manual calibrationX axis rotationY axis rotationZ axis rotation

Figure 9: Miscalibration error (ratio of incorrectly pro-jected points) of the coarse calibration against the man-ual one dependent on the rotation of the marker aroundthe X , Y and Z axis.

Figure 9 presents the miscalibration errors EP of param-eters obtained through the process of the coarse cal-ibration compared to the error of manually annotatedparameters. The results prove that the camera-LiDARpose estimation using the 3D marker yields similar mis-calibration error as the manual calibration.

In comparison to the previously used cali-brations techniques using chessboard markers[KP10, PMSE10, GMCS12], our approach withthe 3D marker brings practical improvement – onlya single frame from the camera and the LiDAR witha single marker appearance is needed for the sensorscalibration. In contrary, the chessboard marker tech-niques require observations of the marker from variousview points or a single observation of multiple markerinstances placed in the scene.

Fine Calibration ResultsAfter the initial coarse estimation of the calibration pa-rameters, the refinement process follows in order to in-crease the calibration accuracy. The goal of this experi-ment is to evaluate impact of the refinement process onthe precision of the calibration.

Figure 10 shows the different solutions evaluated bythe edge cost function SE (Equation (10)) proposed byLevinson and Thrun [LT13] and also by our novel mis-calibration error EP (Equation (11)) which may be con-sidered as an objective factor evaluating each approach.

The solutions compared in Figure 10 use different costfunctions for the calibration parameters evaluation dur-

3 3.05 3.1

3.15 3.2

3.25 3.3

3.35 3.4

Man

ual

[LT

13]

Averag

ing

Erro

r as cost

SE . 1

06

Edge cost function

(a)

0

5

10

15

20

Man

ual

[LT

13]

Averag

ing

Erro

r as cost

EP [

%]

Miscalibration error

(b)

Figure 10: Refinement of the calibration results for thedifferent evaluation criteria compared with the manualcalibration. [LT13] maximizes the edge cost functionSE , the Averaging method computes the fine calibrationas an average of all calibration vectors with higher SEthan the initial one. Miscalibration error EP minimiza-tion is directly used in the search process of the lastapproach.

ing the search. This table also shows that the edge costfunction SE proposed by Levinson and Thrun [LT13]and used for refinement process does not correlate withthe miscalibration error EP. We would expect that thehigher cost the solution has the lower error will bereached.

According to Figure 10, the best results were obtainedusing our projection error EP criteria as the cost func-tion for the calibration parameters. The drawback ofusing this error as a cost function is the requirement ofthe scene which has to be segmented into foregroundand the background in both the LiDAR scan and thecamera image.

The modification of the search process using the aver-aging of the all found calibration vectors better than theinitial coarse one (see Averaging method on Figure 10)seems to be reasonable compromise between the cali-bration accuracy and the no requirement for the scenesegmentation.

Figure 11 presents that the calibration of lower projec-tion error (see Figure 10) yields also better visual resultsin the task of the point-cloud coloring. This fact alsovotes in favour of the miscalibration error rather thenthe edge cost function in objectivity of the calibrationevaluation.

5 CONCLUSIONThis paper presents a pipeline for the RGB camera cali-bration with Velodyne LiDAR. The first step in the cali-bration process estimates a coarse calibration using ournovel 3D marker which allows to estimate the calibra-tion precisely using a single pair of image-point clouddata and even a discplacement between the sensors islarge (tens of centimeters).

(a) (b)

(c) (d)

Figure 11: Coloured point cloud of our 3D markerwhen using the coarse calibration (a), results of refine-ment with edge cost function (b) (from [LT13]), ourmodification that averages all the parameters better thatthe initial one (c), and refinement computed using ourprojection error criteria as the cost function (d). Thered ellipses highlight the largest errors caused by theinaccuracies of the calibration.

The consequent step refines the coarse calibration us-ing a dense search in a small 6DoF calibration parame-ter subspace. The proposed modification of the calibra-tion refinement process decreases the projection errorby 5%.

The paper also introduces a novel miscalibration er-ror metric for the calibration evaluation. This criterioncomes from the basic calibration requirement that thepoints of particular object should be projected on theimage segment occupied by this object. The objectivityof this evaluation function was also proved by the corre-lation with the visual quality of the point cloud colour-ing.

6 ACKNOWLEDGMENTSThe research leading to these results was fundedby the TACR project V3C (no. TE01020415) andthe IT4Innovations Centre of Excellence, grant n.CZ.1.05/1.1.00/02.0070, supported by OperationalProgramme Research and Development for Innova-tions funded by Structural Funds of the EuropeanUnion and the state budget of the Czech Republic.

7 REFERENCES[BCES11] Lionel Baboud, Martin Cadik, Elmar Eise-

mann, and Hans-Peter Seidel. Automaticphoto-to-terrain alignment for the annota-tion of mountain pictures. In IEEE Con-ference on Computer Vision and PatternRecognition (CVPR), 2011.

[BCK13] Yunsu Bok, Dong-Geol Choi, and In SoKweon. Generalized laser three-point al-gorithm for motion estimation of camera-laser fusion system. In Robotics and Au-tomation (ICRA), 2013 IEEE InternationalConference on, pages 2880–2887, May2013.

[BJCK11] Yunsu Bok, Yekeun Jeong, Dong-GeolChoi, and InSo Kweon. Capturing village-level heritages with a hand-held camera-laser fusion sensor. International Journalof Computer Vision, 94(1):36–53, 2011.

[DBR09] B. Douillard, A. Brooks, and F. Ramos. A3d laser and vision based classifier. In In-telligent Sensors, Sensor Networks and In-formation Processing (ISSNIP), 2009 5thInternational Conference on, pages 295–300, Dec 2009.

[FB81] Martin A. Fischler and Robert C. Bolles.Random sample consensus: A paradigmfor model fitting with applications to im-age analysis and automated cartography.Communications of the ACM, 24(6):381–395, June 1981.

[GMCS12] A. Geiger, F. Moosmann, O. Car, andB. Schuster. Automatic camera and rangesensor calibration using a single shot. InRobotics and Automation (ICRA), 2012IEEE International Conference on, pages3936–3943, May 2012.

[HLT13] D. Held, J. Levinson, and S. Thrun. Pre-cision tracking with sparse 3d and densecolor 2d data. In Robotics and Automation(ICRA), 2013 IEEE International Confer-ence on, pages 1138–1145, May 2013.

[KE95] J. Kennedy and R. Eberhart. Particleswarm optimization. In Neural Networks,1995. Proceedings., IEEE InternationalConference on, volume 4, pages 1942–1948 vol.4, Nov 1995.

[KP10] A. Kassir and T. Peynot. Reliable auto-matic camera-laser calibration. In Pro-ceedings of the 2010 Australasian Con-ference on Robotics & Automation, ACRA2010, Dec 2010.

[LN09] Quoc V. Le and Andrew Y. Ng. Joint cali-bration of multiple sensors. In Proceedingsof the 2009 IEEE/RSJ International Con-ference on Intelligent Robots and Systems,IROS’09, pages 3651–3658, Piscataway,NJ, USA, 2009. IEEE Press.

[LT13] Jesse Levinson and Sebastian Thrun. Au-tomatic online calibration of cameras andlasers. Robotics: Science and Systems, XI,

2013.[NMV10] J.I. Nieto, S.T. Monteiro, and D. Viejo.

3d geological modelling using laser andhyperspectral data. In Geoscience and Re-mote Sensing Symposium (IGARSS), 2010IEEE International, pages 4568–4571,July 2010.

[Ots79] Nobuyuki Otsu. A Threshold Selec-tion Method from Gray-level Histograms.IEEE Transactions on Systems, Man andCybernetics, 9(1):62–66, 1979.

[PMSE10] G. Pandey, J. McBride, S. Savarese, andR Eustice. Extrinsic calibration of a 3dlaser scanner and an omnidirectional cam-era. In 7th IFAC Symposium on IntelligentAutonomous Vehicles (2010), pages 336–341, 2010.

[PMSE12] Gaurav Pandey, James R. McBride, SilvioSavarese, and Ryan M. Eustice. Auto-matic targetless extrinsic calibration of a3d lidar and camera by maximizing mutualinformation. In Proceedings of the AAAINational Conference on Artificial Intel-ligence, pages 2053–2059, Toronto, July2012.

[PPRJ13] Nunez P., Drews P., Rocha R., and DiasJ. Data fusion calibration for a 3d laserrange finder and a camera using inertialdata. In Proc. of 4th European Conf onMobile Robots, pages 31–36, 2013.

[ST94] J. Shi and C. Tomasi. Good features totrack. In Computer Vision and PatternRecognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Con-ference on, pages 593–600, Jun 1994.

[TN13] Zachary Taylor and Juan Nieto. Automaticcalibration of lidar and camera images us-ing normalized mutual information. InRobotics and Automation (ICRA), 2013IEEE International Conference on, May2013.

[YPIK90] H. K. Yuen, J. Princen, J. Illingworth, andJ. Kittler. Comparative study of houghtransform methods for circle finding. Im-age Vision Comput., 8(1):71–77, February1990.

[ZP04] Q. Zhang and R. Pless. Extrinsic calibra-tion of a camera and laser range finder(improves camera calibration). In Intel-ligent Robots and Systems, 2004. (IROS2004). Proceedings. 2004 IEEE/RSJ Inter-national Conference on, volume 3, pages2301–2306 vol.3, Sept 2004.

calibration of rgb camera with velodyne lidar · methods of the automatic camera calibration with...

Documents