( e ( e - do, engineering - unknown

7/29/2019 ( E ( E - Do, Engineering - Unknown

1/4

Application of Neural Networks for Stereo-Camera CalibrationYongtae DoSchool ofComputer and Com munication Engineering, Taegu University,Kyungsan-City, Kyungpook,712-714, Korea

ytdo@,biho h e m . c.kr

AbstractTheposition of a world point can be measured by the useof calibrated stereo cameras. Although simple linearmethodrrfor the calibration assuming an ideal projectionmodel are available, their solutions are usually notaccurate since most of f-th e-shelf enses used in machinevision applications sustain considerable amount ofnonlinear distortio n. Recent research efforts on theproblem have been thus concentrated on the modeling oflens distortion and its correction techniques. However; thetypes of lens distortion are various and the equationsderived are more complicated if more precise model isemployed fo r higher accuracy . In this paper; methodrr fo rcalibrating stereo vision systems with neural networks aredescribed . Diflerent approaches are tested under variousconditions and their results are compared.

1.IntroductionThe calibration process is an important prerequisite formost computational vision tasks. Generally, a visionsystem is calibrated for largely two purposes; forcomputing the image coordinates from given worldcoordinates brojection) or for estimating the 3D positionof a world point from its stereo image points (back-projection). Projection is primarily used in computergraphics, where an ideal camera model is often used. Onthe other hand, the purpose of back-projection is mainly tomake position measurements for 3D applicationsincluding dimensional inspection and roboticmanipulation. When cameras are used for back-projectionapplications, high accuracy in calibration is thus ofimportance.Earlier camera calibration techniques usually employed aperfect pinhole camera model and the processes aresimple and fast when applied [l]. However, due to theignorance of nonlinearity that inherently exists in theimaging process, high accuracy is difficult to be obtained.A large portion of recent research work thushas been withthe development of more precise cameramodels involvingcorrection of lens distortion, which is identified as the

major source of system nonlinearity. Tsai [2], for example,proposed an efficient technique which uses a series ofequations to determine camera parameters in twocalibration stages with a simplified model of radial lensdistortion.It should be noted, however, that techniques based onexplicit models of physical vision system have a couple ofpractical disadvantages: (a) Optical features of camerasare different one another and a method that is proved to beeffective for one system may be inefficient for others. Itwas reported that Tsaismethod can be worse than even asimple linear method if lens distortion is relatively low[3]. (b) Practically no model is capable of describing avision system perfectly. Toget more accurateresults,moreelaborate modeling is required and this will bring morecomplicated mathematical equations. In Wengs model[4], for example, two additional types of lens distortionare considered besides radial distortion.By the reasons stated above, techniques do not rely onexplicit camera model may be usell in some applications.Stereoscopic 3D metrology is one of the examples, wherephysical camera parameters are not required to beidentified if a 3D point observed by stereo cameras can beaccurately determined by the use of some intermediatepanmeters. The two-plane method, originally proposedfor back-projection purpose by Martins [ 5] and laterdeveloped further for even projection by Wei [6] ,may bereferred to as the most widely known implicit technique.Wen [7], on the other hand, Utilized a neural network todescribe the part remained outside of the conventionalexplicit projection model. Considering a 3D coordinatecan be represented in a straightforward manner from itsstereo image coordinates, back-projection also isdescribable using a network [8]. Neural networks thus canbe regarded as another effective way for implicitlycalibrating a vision system. In this paper, methods forapplying neural networks to stereoscopic metrology arestudied and the results are comparatively analyzed.

0-7803-5529-6/99/$10.001999 EEE 2719


2/4

2. Calibrating a Stereoscopic3DMeasurement SystemUsingNeural NetsNeural networks have several meaningful features forcamera calibration. First, a network is made up of aninterconnection of nonlinear neurons. Therefore, neuralnetworks have the potential for learning the nonlinearimaging process. Second, the basic philosophies behindsupervised neural net learning and camera calibration arethe same. Both use a set of known data to find systemparameters and apply the parameters later for the dataunseen during the training stage. Third, using a neuralnetwork for a certain problem can be considered as amodel-free solution for the problem, which is in commonwith the nature of implicit camera calibration techniques.It is well known that multilayer neural networks canapproximate any arbitrary continuous function to anydesired degree of accuracy [9]. Thus, if we considerprojection and back-projection as the mapping betweenworld and image, the function identifying the mapping canbe approximated without complicated mathematicalmodeling. In this section, different techniques forapproximating projection and back-projection mappingfunctionsusing neural networks are described.2.1. Direct mapping by neural networksIn stereoscopic back-projection, a 3D point is determinedat the position where lines of sight ray from stereo imagepoints intersect. A coordinate,x,y, orz,of 3Dpoint canbedescribed uniquely with two corresponding image points,i l J l and izj?, by a functionx which determines the back-projection of the stereo,i.e.,The functionfcannot be represented in'a closed-form asthere is redundancy - only three of the four imagecoordinates are enough to compute the three coordinatesof a world point [8]. In addition, the function is nonlineardue mainly to image distortion introduced by the lensused. Therefore, the calibration of a stereoscopic 3Dmeasurement system includes correction of nonlinear lensdistortion effect, determination of geometric and opticalcamera parameters and least squares of redundantmeasurements.In this paper, the function of stereoscopic back-projectionis approximated by neural network. The feasibility of theapproach is based on he fact that multilayer feedforwardnetworks are capable of approximating an arbitrarycontinuous nonlinear function [9] and solving leastsquares problem [IO]. As a world coordinate p isobviously continuous, its images on the stereo are alsocontinuous. So the mapping function between them is

continuous too. Although, in reality, the position of animage on sensor plane is represented as a discrete value, itcan be assumed to be continuous if precise measurementis made in a subpixel accuracy. For a two-layer network,an approximate realization of the function f can berepresented by g liken 4g = (E hal (E hquq - h -)

h=l q=lwhere uI nd ut are activation functions, uq is q inputnode value for a network having n number of hidden layernodes. The goal of approximation is finding networkparameters, w and 0, satisfying the following for all inputpatterns and an arbitrary small positive valueE;Figure 1shows the schematic diagram of the networkdesigned to approximate the function of stereoscopicback-projection. Although stereo image positions areassigned to the input nodes in the figure, others such ashigher orders of image coordinates also can be consideredasadditional input terms.

VCjl J l 3 2 J 2 ) - I J l ,i 2 J 2 ) F E (3)

The concept of neural approximation of back-projectioncan be modified into the approximation problem ofprojection. In this case, a network should be constructedfor each camera separately and input and output of thenetwork are 3D world coordinates and their 2D imagecoordinates respectively.There are two practical difficulties in neuralapproximation of projection or back-projection. First, thedetermination of proper network architecture for theproblem is ambiguous like other neural net applications. Itis difficult, for example, to find the optimal number ofnodes in hidden layer. It is, however, proved that thenumber of hidden nodes for a two-layer network shouldequal m-I or perfectly learningm raining data [1 ]. Thiscan be upper bounds in determining the size of hiddenlayer; to avoid over-learning, the number of hidden nodesshould be much less than the number of training data.Second, it generally takes too long time to achieveaccurate approximation by training a neural network. Theaccuracy obtainable after reasonable training time cannotbe comparable even with that of simple linear method.This problem can be tackled with the method described inthe next subsection.

Figure 1.Approximation of stereoscopic back-projectionusing neural network

2720


3/4

2.2. Learning the error of linear methodCertainly traditional calibration techniques based on dealpinhole camera model have a practically importantadvantage; they are fast and simple as only linearequations in closed-form are needed to be solved. On theother hand, the disadvantage of the methods is also clear;they may not be accurate because the lens distortion effectis ignored.From this observation, a linear method is used for 3Dposition estimation and its error is corrected by neuralnetwork. This approach is inspired by the work of Wenand Schweizer[7],where a neural netwxk is employed todescribe the unknown part remained after explicitcalibration for projection. In this paper, however, the roleassigned to neural network is the approximation of theerror due mainly to lens distortion. It is expected that, withthis approach, we can maintain the major advantage oflinear methods and obtain improved accuracy without anycomplicated mathematical modeling process thank tononlinear learning capability of neural network. As shownin figure 2, the network input consists of stereo imagecoordinates and the network is trained for e, 3D positionalerror of linear method. After training, the 3D position canbe computed by summing output of linear method and theerror correction term estimated by the network aspppL+eN.The error e is due to the image error, say e,,,for a set of image coordinates u=(iljl,izJz). On the otherhand, e,, can be regarded as the sumof deterministic error4 ue mainly to lens distortion and random error e~ dueto system uncertainty and electrical noise likewhere% for an image coordinate can bemodeled as [4]sl(i"+I 2 ) +3d1i12+dlj" +2d J' .'+ki'(iI2+ I 2 )for i coordinate, ( 5 4s2(i"+j")+2d1i''+d2i2+3d27+kj'(i"+I 2 )forj coordinate. (5.b)In the equations, ( i ' J 3 are ideal image coordinatesmeasurable from optical image centre, s1,s2 are thin prismdistortion parameters, d1,dz are decentering distortionparameters, k s a radial distortion parameter. This error isdeterministic and can be estimated by neural networkwhile random error is difficult to be attacked withnetwork.

(4), = e D+e,

This concept can be applied to the projection processeither. In this case, a neural network can be employed totransform the distorted image to ideal image by trainingthem to learn the error between real (distorted) imagecoordinates and ideal image coordinates. The correctedstereo images can be used for 3D position measurementasshown in Figure 3. Both input and output layers have

only two nodes and the size of network can be smallerthan that of the network built in Figure 2.But each cameraof the stereoneeds ts own correction networkNeural networks used for back-projection and projectionshown in figure 2 and 3 can be regarded as thepostprocessor and preprocessor for the linear methodrespectively. The role of network is limited to a specificjob while a network is used for approximating the entireback-projection (or projection) process in the directmapping network scheme shown in figure 1.

JFigure2.Linear method corrected by neural network

F i _ m 3. Neural nets used for correcting image distortion

3. Simulation ResultsTo evaluate the performance of techniques described inthis paper, simulations with synthetic data are carried out.Two cameras are assumed to be positioned at (500, 260,1000) and (550, 300, 1OOO) with z-y-x Euler angles of(180, O", 5" ) and (180, O", -5") respectively. All lengthsare expressed in millimeters unless stated otherwise.Points at FO and 100 are used for calibration while pointsat A 0 re used for performance test. Optical parameters

2721


4/4

assumed for both cameras are 15mm focal length,11.25x15.OOpm sized pixel, 5 1 2 0 x 4 8 0 0 pixels perimage plane, and optical centre at (250,230). Threedifferent data sets are assumed; data affected mainly byradial distortion, data affected by radial and considerabletangential distortion, and data corrupted by high level ofnoise in addition to radial distortion. The variance ofGaussian noise added is S2/12 because the uniformquantization noise for pixel space 6 equals S2/12 inhorizontal or vertical direction [4]. The noise level can bereduced if images are assumed to be measured in asubpixel accuracy. Specific parameters used forsynthesizing he data sets are as he following:Data type I: k=O.0003, ~1=~2=dl=d2=0,dditive noise by1/5 subpixel accuracy measurement,Data type 11: H.0003, sl=0.0002, ~=0.0005,dl=0.0002;d2=0.0001, additive noise by 1/5 subpixel accuracymeasurement,Data type 111: M.0003, s l=s2=dl=d2=OY additive noise byquantization without subpixel accuracy measurement.

Method IMethod I1

For each data set, calibration methods described in theprevious section are applied and the performance isobserved. Table 1shows the results, where method I islinear method, method I1 is direct back-projection neuralnetwork, method I11 is linear back-projection corrected byneural network, and method IV is linear back-projection&er transformation from real image to ideal image byneural network The average distance between real 3Dpoints and computed points is used for the accuracyevaluation criterion. All the neural networks have 16nodes at hidden layer and the results are obtained after30000 training iterations by back-propagation algorithm.60 data are used for calibration and 30 points are used fortest.

Data type I Data type I1 Data type III1.52186 1.81813 5.329477.08586 12.31885 15.15561

From the table, it canbe found that using a neural net withlinear method is effective for the distorted data. Theaccuracy is improved by the factor of two approximately.However, by the reason described in section 2.2, theaccuracy improvement is not significant for the datacorrupted by significant random noise. Direct mapping byneural network, on the other hand, shows only roughestimation capability.

~ ~ ~~.~-Method I11Method IV

Table 1. Average 3D position measurement errors byvarious methodson different data sets

... .~. ~ __ .._~.~..._0.88862 I 0.90571 I 4.426100.79186 I 0.95434 I 4.24230

4.SummaryTechniques for stereoscopic 3D position measurementusing neural networks are studied. Several differentapproaches are studied and simulated with various datasets. Networks used for correcting errors of linear back-projection method and for transforming distorted image toideal image are found to be useful for accuratemeasurement. Direct mapping by neural network providesthe most straightforward and simple solution for thestereoscopic back-projection problem. However, highaccuracy seems to be difficult to be obtained in reasonabletraining time. So the application of the direct mappingnetwork may be limited to tasks where simplicity is moreimportant than accuracy.

References[13Y.Yakimovsky and RCunningham, A system for extractingthree-dimensional measurements from a stereo pair of TVcameras, Computer Graphics and Image Processing, Vo1.7,[2] RYTsai, A versatile camera calibration technique for highaccuracy 3d machine vision metrology using off-the-shelf TVcamerasand lenses, IEEE J.Robotics &Automation, Vol. RA-3,[3] S.-W.Shih, Y.-P.Hung, and W.-S.Lin, When should considerlens distortion in camera calibration, Pattern Recognition,Vo1.28, No.3, pp.447461, 1995.[4] J.Weng, P.Cohen, and M.Herniou, Camera calibration withdistortionmodels and accuracy evaluation, IEEE Trans. PatternAnalysis and Machine Intelligence, Vo1.14, N0.10, pp.965-980,1992.[5] H.A.Martins, J.RBirk, and RB.Kelley, Camera modelsbased on data fkom two calibration planes, Computer Graphicsand Image Processing, Vo1.17, pp.173-180,1981.[6] G.-Q.Wei and S.D.Ma, Implicit and explicit cameracalibration: theory and experiments, IEEE Trans. PatternAnalysis and Machine Intelligence, Vo1.16, No.5, pp.469480,1994.[A J.Wen and GSchweitzer, Hybrid calibration of CCDcameras using artificial neural nets, Int. Joint Cod. NeuralNetworks, pp.337-342,1991.[8] YDo, S.-H.Yo0, and D.-S.Lee, Direct calibrationmethodology for stereo cameras, SPIE Conf. Vol. 3521:Machine vision systems for inspection and metrology W~p.54-65~1998.[9] IC-I.Fmahashi, On the approximate realization ofcontinuous mapping by neural networks, Neural Networks,[lo] A.Cichocki and RUnIxhauen, Simplified neural networksfor solvinglinear least s qua r e s and total least squares problemsin real tim, IEEE Trans. Neural Networks, Vo1.5, N0.6,[ll]M.A.Sartori andP.J.Antsaklis, A simple method to derivebounds on th e size and to train multilayer neural networks,IEEETrans.Neural Networks, V01.2, No.4, pp.467471, 1991.

pp. 195-210,1978.

N0.4, pp.323-344,1987.

V01.2, pp.183-192,1989.

pp.910-923,1994.

2722

( e ( e - do, engineering - unknown

Documents