3d object localization using 2d estimates for computer ...3d object localization using 2d estimates...

3D Object Localization Using 2D Estimates forComputer Vision Applications

Taha Hasan Masood Siddique∗ and Muhammad Usman†

∗Faculty of Engineering Sciences and Technology (FEST), Iqra University, Defence View, Karachi-75500, Pakistan.Email: [email protected]

†Department of Computer Engineering, Chosun University, Gwangju-61452, Republic of Korea.Email: [email protected]

Abstract—A technique for object localization based onpose estimation and camera calibration is presented. The 3-dimensional (3D) coordinates are estimated by collecting multiple2-dimensional (2D) images of the object and are utilized forthe calibration of the camera. The calibration steps involving anumber of parameter calculation including intrinsic and extrinsicparameters for the removal of lens distortion, computation ofobject’s size and camera’s position calculation are discussed. Atransformation strategy to estimate the 3D pose using the 2Dimages is presented. The proposed method is implemented onMATLAB and validation experiments are carried out for bothpose estimation and camera calibration.

Index Terms—Camera calibration, pose estimation, camera local-ization, checkerboard detection

I. INTRODUCTION

With the advancement of technology, computer vision playsa vital role in many fields such as artificial intelligence,machine learning and image processing. These applicationsrequire the spatial information of the objects for the determina-tion of the physical axes and coordinates. Certain applicationsincluding interactive gaming devices and 3D printing, modelsthe real-world objects, measures distance to provide virtualreality control [1], [2]. For a highly precise and accurateexperience, expensive 3D cameras with complicated config-uration are are mostly used [3]. One off the shelf techniqueis to use 2D imaging, in which 2D images are captured fromdifferent angles and transform them into 3D [4]. This methodhas been considered as low cost alternative to acquire the saidtask, however, the performance is greatly dependent on theconfiguration and calibration of the camera [5].

Camera calibration is indispensable in the computer visionapplications as most of the computer vision systems arehighly affected by the precision of the calibration. Calibrationinvolves principal developments during 3D reconstructionsincluding the restoration of camera’s geometry [6], extractionand analysis of 3D information and measurements with respectto the 3D world [7]. This technique is also employed towardsestimation of the 3D location and the rotation of the camerarelative to extrinsic and intrinsic parameters, which providesinformation about the 3D world coordinates and represents op-tical characteristics of camera respectively [8]. Determinationof the parameters of the function to explain the mapping fromthe position of a point in the 3D coordinates to the location

of a point on the image plane is one of the primary objectiveof the camera calibration [9], [10].

The design objective of any 3D computer vision or machinevision is to extract image information from the camera andestimate the position, shape information and recognize 3Dobjects in the 3D world coordinates [11]. The geometricparameters, also known as camera parameters, empiricallydetermines the relationship between the camera position and3D world coordinates. Checkerboards are frequently used incamera calibration, allowing camera parameters to extractmore precise information from images [12]. In [13], a flexibletechnique is used that requires the camera to observe theobject’s pattern from a few angles only. Furthermore, animproved method for calibrating omni-directional imagingsystem is presented which helps to decrease the number ofcalibration steps. Currently, the camera calibration techniquesare classified into two parts. (i) The target calibration method,which generally relates the known target information to thecamera parameters and determines 2D to 3D relationshipcompatibility [14]. (ii) The self-calibration method, that doesnot require any targets and instead uses mathematical modelingto determine the camera parameters [15].

Mapping of camera position from 2D image to the 3D worldcoordinates is a common issue in image processing. When the3D coordinate information are acquired by computer vision,the distortion coefficients and camera parameters e.g. thefocal length are needed to be calculated beforehand. However,in [16], the camera calibration is implied by developingthe 3D point of scanning object, using the structure frommotion technique without using any specific model of thecamera. Similarly, checkerboard corners are intelligently usedto estimate the pose of a camera [17]. An approach forpose estimation with known camera parameters is presentedin [18], which estimates the position of the multi-camerasystem using fixed orientations and translation of the camera.Another method for multi-camera calibration is presented in[19], which resolve the checkerboard corners related problemsusing binary patterns between time and different cameras.

In this paper, single camera calibration method is usedwhere the camera projection matrix measures the 3D coordi-nates of a viewed points. The projection matrix is comprisedof extrinsic and intrinsic parameters and converts the 3D

arX

iv:2

009.

1144

6v1

[cs

.CV

] 2

4 Se

p 20

20

object coordinates into 2D image coordinates. The softwareimplementation of the proposed approach is carried out usingMATLAB. The rest of the paper is organized as follows:Section II presents the proposed approaches for calibrating thecamera and pose estimation, experimental results are presentedin Section III and paper is finally concluded in Section III-B.

II. METHODOLOGY

In this section, we explain our calibration method in detail.The method consist of three main parts. Camera model andits parameters are discussed in Subsection II-A, calibrationmethod in Subsection II-B and camera calibration and poseestimation is presented in Subsection II-C.

A. Camera Model

It is necessary to choose a suitable camera model and ex-amine its internal and external parameters. Internal parametersprovide the geometry and optical characteristics of a cameraincluding the focal length, image center and lens distortionwhere as the external parameters provide 3D orientation andposition of a camera related to the world coordinates. Theseparameters are commonly used in pin hole camera which hasno lens and contains single small aperture. When light raysenter into the aperture, an inverted image is formed on theopposite side of the camera which holds the erect reflectionof the scene. This phenomenon has been shown in Figure 1.

Fig. 1: Pin Hole Camera Model

The camera parameters consist of 4 × 3 matrix termedas the camera matrix. The camera matrix transforms the 3Dcoordinates into the 2D image points by using extrinsic andintrinsic parameters. The extrinsic parameters contains thelocation of the camera whereas intrinsic parameters holdsfocal length of the camera in the 3D world units as shownin equations below.

w[a b 1

]=

[A B C 1

]P (1)

where w is a scaling factor, [a b 1] are 2D image points and[A B C 1] are 3D unit matrix.

P =

[Rt

]K (2)

where P represents the camera matrix, R and t are therotational and translation elements of the extrinsic matrixrespectively, while K represents the intrinsic matrix.

Extrinsic parameters transforms the 3D coordinates intocamera coordinates and then intrinsic parameters converts thecamera coordinates into the image plane as shown in Figure2.

Fig. 2: Mapping of Camera Coordinates

B. Calibration Method

We present the calibration technique to calibrate the off theshelf Logitech HD c270 USB webcam. The camera calibrationis carried out on HP corei5 computer with the MATLABUSB support package [20]. Camera calibrator applicationis used to determine the camera’s intrinsic, extrinsic andlens distortion parameters. These parameters can be usedfor various computer vision application including removal oflens distortion from an image, measuring the object’s size orreconstructing 3D scenes from multiple position. Once thecamera is connected to the computer, a set of checkerboardtest images must be added. The calibration algorithm detectsthe corners of checkerboard to ensure that test images tomeet the calibrator requirement. When the test image satisfiesthe calibrator requirement, the calibration accuracy can beevaluated and improved by analyzing the re-projections errors,camera’s extrinsic parameters and viewing the un-distortedimage. After successful calibration, the export camera param-eters can be used for many computer vision task. Severalmethods including Scale Invariant Feature Transform (SIFT)[21] and Speeded Up Robust Transform (SURF) [22] havebeen proposed that use the checkerboard corner detection forthe extraction of checkerboard crossings square corners ofcheckerboard. The camera calibration workflow is shown inFigure 3.

Fig. 3: Camera Calibration Workflow

C. Camera Calibration and Pose Estimation

An image of 10× 7 checkerboard is used as a reference toobtain features of interest of the target image. The checker-board pattern must contain same number of black squaresalong one side and same number of white squares on theopposite side. This standard determines the orientation ofcheckerboard pattern in calibration. A 23mm checkerboardsquare size is used for calibration, a template of which isshown in Figure 4.

Fig. 4: Calibration Template

Initially, the input images are processed through MATLABcamera calibrator application. For better results and the eval-uation of the corners, 20 images of calibration pattern areused which are shown in Figure 5. The default configurationcalculates both camera parameters i.e intrinsic and extrinsicparameters depending upon the type of camera and then itcomputes the projection error. Once the calibration process iscompleted, the camera pose can be estimated with respect tothe 3D object location as depicted in the calibration processflow in Figure 6. The detected corner points of checkerboardand the projection error of detecting square box of checker-board are shown in Figure 7 and Figure 8 respectively.

Fig. 5: Input Images in Calibration Process

III. EXPERIMENTAL RESULTS

A. Simulated Experiment

From the simulation results, camera parameters includingfocal length, principal point, radial distortion, mean projection

Fig. 6: Calibration Process Flow

Fig. 7: Detected Corner Points of Checkerboard

Fig. 8: Projection Error of Detecting Square Box

error and intrinsic parameters matrix are calculated. The focallength, principal point and radial distortion are stored in a2× 1 vector. The intrinsic parameters are stored in the 3× 3matrix along with mean projection error is calculated. Theseparameters are enlisted in Table I. The calibration accuracy isexamined by an un-distorted image, camera extrinsic and re-projection errors. Figure 9a shows the input image which istaken from USB webcam. After the removing lens distortion,the un-distorted image is obtained as shown in Figure 9b.

The re-projection errors are the distances between the detectedcorner points and re-projected points of checkerboard in pixels.Generally, the mean re-projection error of less than one pixelis acceptable [23]. The mean re-projection error per image inpixel and overall mean error of selected images is shown inFigure 10.

TABLE I: Camera Parameters

Image size (pixels) [480 640]Focal length (pixels) [ 839.3458 ± 3.6694, 839.5573 ± 3.7166 ]

Principal point (pixels) [ 332.3661 ± 1.4489, 259.5099 ± 1.5829 ]Radial distortion [ 0.0101 ± 0.0167 -0.1883 ± 0.1895 ]

Mean projection error 0.363945706962709

Intrinsic parameters

839.345758 0.000000 0.0000000.000000 839.557331 0.000000

332.366095 259.509924 1.000000

(a) Input Image (b) Undistorted Image

Fig. 9: Removal of Distortion

Fig. 10: Re-projection Errors

The 3D extrinsic parameters plot provides a pattern andcamera view visualization. The pattern view visualization isuseful if the pattern is placed at fixed position. Similarly,the camera view visualization is useful when the imagesare captured by placing a camera at fixed position. The 3Dextrinsic parameter visualization of pattern view visualizationis shown in Figure 11 and camera view visualization is shownin Figure 12.

Fig. 11: Pattern View Visualization

Fig. 12: Camera View Visualization

B. 3D Reconstruction using Structure from Motion (SFM)

Structure from Motion (SFM) is the method of estimatingthe 3D scene of an object by using multiple 2D imagescaptured by a camera [24]. It is used in many applications suchas 3D scanning [25], augmented reality [26], robot mapping[27], and autonomous driving [28]. By using SFM, the pose ofthe calibrated camera is estimated from a set of 2D images, andthe information is applied to reconstruct the 3D point cloud ofan unknown object. This algorithm consists of two parts, (i)camera pose estimation and (ii) 3D dense reconstruction. Inthe first part, the algorithm uses SURF technique to detectthe interest points and features in the set of 2D imageswhich compares the pairwise match to estimate the camerapose of the current view related to the previous view. Inthe second part, SFM requires point correspondence acrossmultiple images called track points. These track points then

serve as input to multiple images by using triangulation tocompute 3D points. These 3D computed points are known asbundle adjustments and are used for 3D reconstruction. Theexperiment is performed on the cubic box to generate the 3Dpoint cloud of the known object. Figure 13 shows the processflow of structure from motion, experimental setup has beendepicted in Figure 14a and the 3D point cloud of cubic boxis illustrated in Figure 14b.

Fig. 13: Process Flow of Structure from Motion

(a) Experimental Setup (b) 3D Point Cloud

Fig. 14: Image to 3D Point Cloud Generation

CONCLUSION

Object localization and camera calibration is of utmostimportance and has several commercial and industrial applica-tions. Techniques for object localization based on pose estima-tion and camera calibration are presented. The 3-dimensional(3D) coordinates are estimated by collecting multiple 2-dimensional (2D) images of the the object and are utilized forthe calibration of the camera. Automatic camera calibrationtechnique based on the features of the checkerboard has beenpresented. The location of the camera in the 3D coordinatesystem is estimated by utilizing the information extracted fromthe 2D images of the checkerboard taken from several angles.The results of the camera parameters and localization essentialfor the 3D reconstruction have been presented. The proposedtechnique achieves a mean projection error of less than onepixel re-projection error implying the successful calibration ofLogitech HD c270 camera.

REFERENCES

[1] V. Wiley and T. Lucas, “Computer vision and image processing: a paperreview,” International Journal of Artificial Intelligence Research, vol. 2,no. 1, pp. 29–36, 2018.

[2] T. H. M. Siddique, I. Sami, M. Z. Nisar, M. Naeem, A. Karim,and M. Usman, “Low cost 3d printing for rapid prototyping and itsapplication,” in 2019 Second International Conference on Latest trendsin Electrical Engineering and Computing Technologies (INTELLECT),2019, pp. 1–5.

[3] S. Giancola, M. Valenti, and R. Sala, A survey on 3D cameras:Metrological comparison of time-of-flight, structured-light and activestereoscopy technologies. Springer, 2018.

[4] S. Byun, K. Jung, S. Im, and M. Chang, “Registration of 3d scan datausing image reprojection,” International Journal of Precision Engineer-ing and Manufacturing, vol. 18, no. 9, pp. 1221–1229, 2017.

[5] R. Usamentiaga, D. Garcia, C. Ibarra-Castanedo, and X. Maldague,“Highly accurate geometric calibration for infrared cameras using inex-pensive calibration targets,” Measurement, vol. 112, pp. 105–116, 2017.

[6] Q. Sun, X. Wang, J. Xu, L. Wang, H. Zhang, J. Yu, T. Su, and X. Zhang,“Camera self-calibration with lens distortion,” Optik, vol. 127, no. 10,pp. 4506–4513, 2016.

[7] L. Meng, J. Chen, F. Tung, J. J. Little, J. Valentin, and C. W. de Silva,“Backtracking regression forests for accurate camera relocalization,”in 2017 IEEE/RSJ International Conference on Intelligent Robots andSystems (IROS). IEEE, 2017, pp. 6886–6893.

[8] Z. Liu, Q. Wu, S. Wu, and X. Pan, “Flexible and accurate cameracalibration using grid spherical images,” Optics Express, vol. 25, no. 13,pp. 15 269–15 285, 2017.

[9] C. Tomasi, “Camera calibration,” 2017.[10] A. Fetic, D. Juric, and D. Osmankovic, “The procedure of a camera

calibration using camera calibration toolbox for matlab,” in 2012 Pro-ceedings of the 35th International Convention MIPRO. IEEE, 2012,pp. 1752–1757.

[11] S. Huang, Y. Chen, T. Yuan, S. Qi, Y. Zhu, and S.-C. Zhu, “Perspec-tivenet: 3d object detection from a single rgb image via perspectivepoints,” in Advances in Neural Information Processing Systems, 2019,pp. 8905–8917.

[12] A. Duda and U. Frese, “Accurate detection and localization of checker-board corners for calibration.” in BMVC, 2018, p. 126.

[13] S. Urban, J. Leitloff, and S. Hinz, “Improved wide-angle, fisheye andomnidirectional camera calibration,” ISPRS Journal of Photogrammetryand Remote Sensing, vol. 108, pp. 72–79, 2015.

[14] Z. Zhang, R. Zhao, E. Liu, K. Yan, and Y. Ma, “A single-image linearcalibration method for camera,” Measurement, vol. 130, pp. 298–305,2018.

[15] H. Chang and F. Tsai, “Vanishing point extraction and refinement forrobust camera calibration,” Sensors, vol. 18, no. 1, p. 63, 2018.

[16] Y. Rehman, H. M. A. Uddin, T. H. M. Siddique, S. R. U. N. Jafri,A. Ahmed et al., “Comparison of camera and laser scanner based 3dpoint cloud,” in 2019 4th International Conference on Emerging Trendsin Engineering, Sciences and Technology (ICEEST). IEEE, 2019, pp.1–5.

[17] F. Shi, X. Zhang, and Y. Liu, “A new method of camera pose estima-tion using 2d–3d corner correspondence,” Pattern Recognition Letters,vol. 25, no. 10, pp. 1155–1163, 2004.

[18] J.-M. Frahm, K. Koser, and R. Koch, “Pose estimation for multi-camerasystems,” in Joint Pattern Recognition Symposium. Springer, 2004, pp.286–293.

[19] B. Li, L. Heng, K. Koser, and M. Pollefeys, “A multiple-camera systemcalibration toolbox using a feature descriptor-based calibration pattern,”in 2013 IEEE/RSJ International Conference on Intelligent Robots andSystems. IEEE, 2013, pp. 1301–1307.

[20] https://www.mathworks.com/hardware-support/matlab-webcam.html.[21] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”

International journal of computer vision, vol. 60, no. 2, pp. 91–110,2004.

[22] H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robustfeatures,” in European conference on computer vision. Springer, 2006,pp. 404–417.

[23] https://www.mathworks.com/help/vision/ug/single-camera-calibrator-app.html.

[24] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,”in Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2016, pp. 4104–4113.

[25] M. Daneshmand, A. Helmi, E. Avots, F. Noroozi, F. Alisinanoglu, H. S.Arslan, J. Gorbova, R. E. Haamer, C. Ozcinar, and G. Anbarjafari, “3dscanning: A comprehensive survey,” arXiv preprint arXiv:1801.08863,2018.

https://www.mathworks.com/hardware-support/matlab-webcam.html

https://www.mathworks.com/help/vision/ug/single-camera-calibrator-app.html

https://www.mathworks.com/help/vision/ug/single-camera-calibrator-app.html

[26] H. Bae, M. Walker, J. White, Y. Pan, Y. Sun, and M. Golparvar-Fard,“Fast and scalable structure-from-motion based localization for high-precision mobile augmented reality systems,” mUX: The Journal ofMobile User Experience, vol. 5, no. 1, p. 4, 2016.

[27] M. R. U. Saputra, A. Markham, and N. Trigoni, “Visual slam andstructure from motion in dynamic environments: A survey,” ACMComputing Surveys (CSUR), vol. 51, no. 2, pp. 1–36, 2018.

[28] R. Fan, J. Jiao, H. Ye, Y. Yu, I. Pitas, and M. Liu, “Key ingredients ofself-driving cars,” arXiv preprint arXiv:1906.02939, 2019.

3d object localization using 2d estimates for computer ...3d object localization using 2d estimates...

Documents