06.07.fast 3d stabilization and mosaic construction

8/14/2019 06.07.Fast 3D Stabilization and Mosaic Construction

1/6

Fast 3D Stabilization and Mosaic ConstructionCarlos Morimoto Rama Chellappa

Computer Vision Laboratory, Center for Automation ResearchUniversity of Maryland, College Park, MD 20742-3275

[email protected]

AbstractWe present a fa st electronic im age stabilization systemthat compensates o r 3 0 rotation. The extended Kalm anJil-ter ramew ork is employed to estimate the rotation betweenfram es, which is represented using unit quaternion s. A smallset of automatically selected and tracked eature points areused as measurements. The effectiveness of this techniqueis also demonstrated by constructing mosaic images fro mthe motion estimates, and comparing them to mosaics built

from 2 0 stabilization algorithms. Two different stabiliza-tion schemes are presented. The irst, implem ented in a real-time platform based on a Datacube MV200 board, estimatesthe motion between two consecutive fram es and is able toprocess gray level images of resolution 128x120 at 10 Hz.The second scheme estimates the motion between the currentfram e and an inverse mosaic; this allows better estimationwithout the need fo r indexing the new image rames. Exper-imental results o r both schemes using real and synthetic im-age sequences are presented.

1. IntroductionCamera motion estimation is an integral part of any com-

puter vision or robotic system that has to navigate in a dy-namic environment. Whenever part of the camera motion isnot necessary (unwanted or unintended), image stabilizationcan be applied as a useful preprocessing step before furtheranalysis of the image sequence. In this paper we use an iter-ative extended Kalman filter (IEKF) to estimate the 3D mo-tion of the camera. Stabilization is achieved by derotatingthe input sequence [2 ,3 ,9 ].We present two different stabilization algorithms. Thefirst estimates the motion between two consecutive framesand was implemented in a real-time platform (a Datacube

OThe support of ARPA and AR O under contract DAAH04;93-G-0419,an d of the Conselho N acional d e Desenvolvimento Cientifico e Tec-noldgico (CNPq), are gratefully acknow ledged.

MV200 connected to a Sun Sparc 20). The second algo-rithm computes the motion of the current frame relative toan inverse mosaic. Computing the frame-to-mosaic motionrequires indexing (finding the approximate position of thenew image in the mosaic, in order to facilitate the motionestimation process). We show in Section 4.1 how the useof inverse mosaics can solve the indexing problem. For animage sequence with n frames, a mosaic image can be con-structed by placing the initial frame fo at the center of themosaic and then warping every new frame to its correct po-sition up to frame fn. To build an inverse mosaic one canfirst warp frame f n using the inverse motion parameters, andthen warp every previous frame down to frame fo. The in-verse mosaic can also be computed directly from the mosaicimage, using a single global inverse transformation.

Most of the current image stabilization algorithms imple-mented in real-time use 2D models due to their simplicity[4, 81. Feature-based motion estimation or image registra-tion algorithms are used by these methods in order to bringall the images in the sequence onto alignment. For 3D mod-els under perspective projection, the displacement of eachimage pixel also depends on the structure of the scene, ormore precisely, on the depth of the corresponding 3D point.It i s possible to parameterize these models so that only thetranslational components carry structural information, whilethe rotational components are depth-independent. In this pa-per, stabilization is defined as the process of compensatingfor 3D rotation; this definition is also used in [2,3,9]. Com-pensating for the camera rotation makes the image sequencelookmechanically stabilized, as if the camera were mountedon a gyroscopic platform. The fast implementation of the3D stabilization system can process about 10frames per sec-ond for 8-bit gray level imagesof resolution 128x 120 (withmaximum interframe displacement of 15 pixels), demon-strating that IEKFs are efficient tools for 3D motion estima-tion and real-time image analysis applications.

The rest of this paper is organized as follows: Section2 describes the camera motion model used for parameterestimation, which is the subject of Section 3. The detailsof the two stabilization algorithms are presented in Section

1063-6919/97 $10.00 0 997 IEEE 660

uthorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:12 from IEEE Xplore. Restrictions apply.
mailto:cfar.umd.edumailto:cfar.umd.edu


2/6

4, and experimental results from both algorithms using realand synthetic mage sequences are shown in Section 5 . Sec-tion 6 concludes the paper.2. Camera Motion Model

Let P = (X,Y ,Z)* be a 3D point in a Cartesian coordi-nate system fixed on the camera and let p = (x, )* be theposition of the image of P. The image plane defined by thex and y axes is perpendicular to the Z axis and contains theprincipal point (0 , 0 , f ) . Thus the perspective projection ofa point P = (X,Y ,Z)* onto the image plane is

where f is the camera focal length.Under a rigid motion assumption, an arbitrary motion of

the camera can be described by a translation T and a rotationR, so that a point P o at the camera coordinate system at timet o is described in the new camera position at t l by

P i = R P o + T +

where R is a 3 x 3 orthonormal matrix.For distant points(2, > TX Ty ,and Tz),he displace-

ment is primarily due to rotation, and the projection of P1can be obtained by

The use of distant features for motion estimation and im-age stabilization has been addressed in [ 2 , 3 , 91. Such fea-tures constitutevery strong visual cues that are present in al-most all outdoor scenes, although it may be hard to guaran-tee that all the features are distant. In this paper we estimatethe three parameters that describe the rotation of the camerausing an iterated extended Kalman filter (IEKF).2.1. Quaternions and Unit Quaternions

In this section we introduce a few basic properties ofquaternions. More detailed descriptions can be found in[ 5 , 61. Quaternions are commonly used to represent 3D ro-tation, and are composed of a scalar and a vector part, as in4 = QO + q, where q = (qz , qY q , ) T . The dot product op-erator for quaternions is defined as 6.G = poqo + p.q, andthe norm of a quaternion is given by 141= Q.4= 902+q.q.A unit quaternion is defined as a quaternion with unit norm.

A unit quaternion can be interpreted as a rotation 8around a unit vector w by the following equation: 4 =sin(O/2) + w cos(B/2). Note that 4 and -4 correspond tothe same rotation since a rotation of 8 around a vector q isequivalent to a rotation of -8 around the vector -9.

Let the conjugate of a quaternion be defined as q* = QO -q, and the product of two quaternions as

Thus, the conjugate of 4 is also its inverse since ( i *Q =t = 1. It is useful to have the product of quaternionsexpanded in matrix form as follows:

Note that the multiplication of quaternions is associative butit is not commutative, i.e., in general 64 is not the same as

The rotation of a vector or point P to a vector or point P'46.can be represented by a quaternion 4 according to

(0+ P') = q o + P)G* (6)Composition of rotations can be performed by multipli-

cation of quaternions since

(7)where it is easy to verify that (F*;i*) is equivalent to (@)* .

The nine components of the orthonormal rotation ma-trix R in (2) can be represented by the parameters of a unitquaternion simply by expanding (6) using ( 5 ) , o that

(0 + PI/)= q o + P')Q*=q q o + P)P*);i*= (@)(O + P)(@)*

1-2.a;-2(1,2 2tQoQzS924y)2(4OQ*sqZQY) 1-2Q2-2d

2(Q oQx sqy Q*) 1-2q2-2q;3. Motion Estimation

The dynamics of the camera is described as the evolu-tion of a unit quaternion. An IEKF is used to estimate theinterframe rotation 4. EKFs have been extensively used forthe problem of motion estimation from a sequence of images[ I , 9, 101. In order to achieve real-time performance, thisframework is simplified here to compute only the rotationalparameters from distant feature points.

A unit quaternion has only three degrees of freedom dueto its unit norm constraint,so that it can be represented usingonly the vector parameters. The remaining scalar parameter

66 1



3/6

is computed from q o = (1- : - i - , ) . Only nonnega-tive values of q o are considered,so that (8) can be rewritten

The state vectorx and plant equations are defined as fol-using ( q z QY I q z ) only.lows:

are actually used by the EKE The solution is refined itera-tively by using the new estimatefi to evaluate H and h fora few iterations. More detailed derivations of the Kalmanfilter equations can be found in [7].

where;iis a quaternion represented by its vector part only,and n is the associated plant noise.The following measurement equations are derived from

( 3 ) :(10)

where h is a nonlinear function which relates the currentstate to the measurement vector z(ti), and 7 is the measure-ment noise. After tracking a set of N feature points, a twostep EKF algorithm is used to estimate the total rotation.The first step is to compute the state and covariance predic-tions at time t i - 1 before incorporating the information from

Z ( t i ) = hili-l[x(ti)l + 17(ti)

Z ( t i ) by

(11)%( t i ) %(t,f_,)X ( t L ) = qt:-l) + XrI(ti-1)where %(t:-,) is the estimate of ti-1) and X(t:-l) isits associated covariance obtained based on the informationup to time 2( ta ) and X ( t a ) are the predicted esti-mates before the incorporation of the ith measurements; andEn(ti-l) is the covariance of the plant noise n (&-1 ) .

The update s tep follows the previous p red i c t i o n s t ep .When ti) becomes available, the state and covariance es-timates are updated by

where K(ti) is a 3 x N matrix which corresponds to theKalman gain, X : , ( t i ) is the covariance of ti) and I is a3 x 3 identity matrix. Hili-1 is the linearized approxima-tion of hili- 1, defined as

A batch process using a least-square estimate of the rota-tional parameters can be used to initialize the algorithm,asin [9], but for the experiments shown in Section5 we simplyassume zero rotation as the initial estimate.

To speed up the process and reduce the amount of compu-tation that is required to achieve real-time performance, themeasurement vectors are sorted according to their fits to theprevious estimate, so that only the best M points ( M < N )

4. Image StabilizationReal-time image stabilization systems have been de-

scribed in [4, 81. They are based on 2D similarity or affinemotion models, and use multi-resolution to compensate forlarge image displacements. Each algorithm was optimizedfor a different image processing platform. Part of the systemdescribed in this paper is based on the system presented in[SI. As in [2 ,3 ,9 ], image stabilizationis defined as the pro-cess of compensating for the 3D camera rotation, i.e., stabi-lization is achieved by derotating the image sequence. Ro-tation is estimated as described above, using the tracked fea-ture positions as measurements.

Figure 1 shows the block diagram of the image stabi-lization system, which was implemented in a real-time im-age processing platform based on a Datacube MV200 board.The Datacube board is based on a parallel architecture com-posed of several image processing elements which can beconnected into pipelines in order to accomplish image pro-cessing tasks. Image acquisition, pyramid construction, andimage warping are performed by the Datacube. The hostcomputer performs tracking, estimates the rotation based onthe algorithm presented in Section 3, and finally combinesthe interframe estimates to determine the global transforma-tion used to stabilize the current image frame. The host isalso responsible for the user interface module, which is notshown in the block diagram.

I I I I

I Camera Laplacian 74 eat[re IDetection7 racking I

MotionCof:%&tion b1 Estimation iWarpingI I L 1 1 IFigure1. Blockdiagram of the real-time imagestabilization system.

A small set of feature points are detected and hierarchi-cally tracked with subpixel accuracy between two consecu-tive frames, using the method described in [8]. The featuredisplacements are then used by the IEKF to update the rota-tion estimate. The interframe estimates must be combined

662



4/6

in order to obtain the global transformation which stabilizesthe current frame relative to.the reference frame. If p i - 1 isthe previous global rotation, and& is the interframe motionbetween frames i and i - 1, the new global rotation fii isobtained by multiplying two quaternions:p i = q i f i i -1 . Fi-nally, the new global estimate is used by the image warperto generate the stabilized sequence.

The simplicity of this method allows its implementationon our real-time platform. This system can be modified toconstruct panoramic views of the scene (mosaics), whichcan also be used to facilitate the motion estimation process.The construction and registration of the current frame toa mosaic image can reduce the estimation errors when thecamera motion predominantly consists of lateral translationor pan and tilt.4.1. Registration to an Inverse Mosaic

Building a mosaic image from 2D affine motion param-eters or 3D rotation parameters can be done by appropri-ately warping each pixel from the new frame into its corre-sponding mosaic position, but in general, better results areachieved when the target image (i.e. the mosaic image) isscanned and the closest pixel value from the new frame (orthe result of a small-neighborhood nterpolation around thatpixel) is inserted into the mosaic.

To register a new frame to the mosaic, it is desirable tohave a rough estimate of its correct placement. Finding thisinitial estimate is known as index ing . Hansen et al. [4]solve the indexing problem by correlating small represen-tative landmark regions of the new frame with the mosaic.Depending on the warping of the current frame, though,this technique could fail. To increase the robustness of thisscheme, it is possible to pre-warp the current frame based onthe previous estimate; this brings the current frame close toits true position in most cases, but it still may be difficult toregister the warped image to the mosaic. The use of inversemosaics avoids the indexing problem by keeping the previ-ous frame always at a fixed position, e.g., at the center of themosaic, and without any distortions due to warping, whichincreases the probability of high correlation of overlappingareas.

Figure2shows the block diagram of the stabilization sys-tem based on inverse mosaics. In order to track features ofdifferent image resolutions, a mosaic pyramid is built fromthe Gaussian pyramid of the current frame and the globalmotion estimate. The inverse mosaic pyramid is obtainedby warping the mosaic pyramid using the inverse global mo-tion estimate. Features detected in the new frame are hierar-chically tracked in the inverse mosaic pyramid, and the es-timation procedure used in the frame-to-frame algorithm isapplied. The new estimate is then used to update the mosaicand inverse mosaic pyramids for the next iteration. The pro-

I IILI Gaussian I Inverse 1 Tracking Iyramid MosaicI I t

tMotion MotionCompensation EstimationFigure 2. Block diagram of the stabilizationsystem using inverse mosaics.

cess has to be reinitialized with the creation of a new mosaicpyramid (or new tiles [4]) whenever a frame is warped out-side the mosaic.

Besides the extra computational cost, one drawback ofthis method is that in general warping requires interpolation,so that the inverse mosaic may be over-smoothed. In the ex-periments presented in the following section, we compen-sate for the over-smoothing effect by using larger correla-tion windows (instead of changing the warping scheme tonearest-neighbor selection, which avoids interpolation butcreates some artifacts).5. Experimental Results

In this section we present experimental results for bothof the algorithms described above. We are forced here toshow only single frames from the stabilized and mosaic se-quences, where video sequences would be much more ap-propriate. We invite the reader to look at these videos,available in MPEG format, at the followingWWW address:http://www.cfar.umd.eduTcarlos/cvpr!97.html.In the fol-lowing sections, a file at this address named mosaic willbe referenced by http: [mosaic].

Figure 3 is an example of the results obtained from thefast 3D derotation system using an off-road sequence pro-vided by NIST. The camera is rigidly mounted on the ve-hicle and is moving forward. The top row shows the 5thframe of the input sequence (left) and its corresponding sta-bilized frame (right). The original and stabilized sequencesare available at http:[seq-1.mpg]. The difference betweenthe top-left image and the 10th input frame is shown onthe bottom left, and the difference between the correspond-ing stabilized frames on the bottom right. The darkness ofa spot on the bottom images is proportional to the differ-ence of intensity between the corresponding spots in eachframe. Since stabilization has to compensate for the mo-tion between frames, the difference images can be consid-ered as error measurements which stabilization attempts to

663

http://www.cfar.umd.edutcarlos/cvpr!97.htmlhttp:///reader/full/[seq-1http:///reader/full/[seq-1http://www.cfar.umd.edutcarlos/cvpr!97.html


5/6

minimize. Observe that the regions around the horizon lineare very light due to stabilization. Errors are bigger aroundobjects that are closer to the camera (darker regions), sincethey have large translational components which are not com-pensated by our method.

-1k I

IFigure 3.3D image stabilization results.

The real-time implementation typically detects andtracks 9 feature points between frames, selects the best 7based on the correlation results, and then uses the 4 pointswhich best approximate the current rotation estimate. Themaximum feature displacement tolerated by the tracker isset to 15pixels. Under these settings, the system is able toprocess approximately9.8 frames per second.

Figure 4 shows the result of the frame-to-mosaic algo-rithm presented in Section 4.1 for 30 frames of a syntheticsequence which simulates dominant lateral translation withsmall rotations around the optical axis. The sequence wasgenerated by warping and cuttingregions (of size 128x 128)from a bigger image (of size 512 x 512). The maximumtranslation between frames was set to 10 pixels and the max-imum rotation to 3 degrees. Figures 4a and 4b show re-sults for the 15th and 26th frames respectively. Each figuresshows the inverse mosaic on top; the regular mosaic (left)and current input frame (right) are shown at the bottom. Thebox in the center of the inverse mosaic corresponds to thelast frame put into the regular mosaic and also serves asthe initial estimate (index) of the region to be registered tothe next frame. After frame 16, the camera starts to moveback to its original position, and the overlap of the currentframe with the mosaic is almost 100%. For this particularsequence, the real-time frame-to-frame algorithm produced

Figure 4.. Registration to an inverse mosaic.

a very similar result, except for a few artifacts due to the ac-cumulation of error.

The last example compares the real-time 3D algorithmwith the real-time similarity model-based algorithm pre-sented in [8]. They both use the same set of 11 feature pointsfor motion estimation. The 2D algorithm uses all of them tofit a similarity model using least-squares, and the 3D modeluses only the four points which best approximate the currentrotation estimate.

The original sequence is composed of 200 frames and thedominant motion is left-to-right panning. To help in com-parison and visualization, the reference frame was assumedto be the 100th frame, and appears at the center of the mo-saics. The first column shows the 50th, 100th, and 200thframes from top to bottom. The second column shows thecorresponding mosaic images constructed from the 2D esti-mates, and the third column shows the corresponding mo-saics constructed using the 3D estimates. Since the cam-era calibration is unknown, we guessed the camera FOVto be 3 x 4 focal lengths. The mosaic from the 2D esti-mates (http:[seq-3.mpgl) does a goodjob locally, but the3D

66 4

http:///reader/full/[seq-3.mpglhttp:///reader/full/[seq-3.mpgl


6/6

I I

Figure5. Mosaics from 200 frames of a left-to-right panning sequence.

mosaic (http:[seq-4.mpgl) looks much more natural, as ifit were a panoramic picture taken using fish eye lens. Theoriginal sequence and the 3D stabilized output can be seenat http:[seq-5.mpgl.

6. ConclusionWe have presented a 3D model-based real-time stabiliza-

tion system that estimates the motionof the camera using anIEKE Stabilization is achieved by derotating the input cam-era sequence. Rotations are represented using unit quater-nions, whose good numerical properties contribute to theoverall performance of the system. The system was im-plemented in a real-time image processing platform (a Dat-acube MV200 connected to a Sun Sparc 20) and is able toprocess 12 8 x 12 0 images at approximately 10Hz.

The system has been successfully tested under a varietyof situations hat include dominant forward and lateral trans-lations with small rotations, as well as panning. We havebuilt mosaic images for these last two cases, and for the par-ticular case of panning, we compared the results of the 3Dalgorithm with those of a 2D similarity model-based algo-rithm to show that 3D mosaicking provides more naturalpanoramic views.We are currently extending the motion models to includetranslation parameters, which we believe will contribute tomake the system more robust and able to generate more real-istic mosaics. We are also investigating the applicability ofthese motion estimation, image stabilization, and mosaick-ing techniques to video coding.

ReferencesT. Broida and R. Chellappa. Estimationof object motion pa-rameters from noisy images. IEEE Trans. Pattern Analysisand Machine Intelligence, 8(1):90-99, January 1986.L. Davis, R. Bajcsy,R. Nelson, and M. Herman. RSTA onthe move. In Proc . DARPA Image Understanding Workshop,pages 435-456, Monterey, CA, November 1994.Z. DuriC and A. Rosenfeld. Stabilization of image sequences.Technical Report CAR-TR-778, Center for Automation Re-search, Universityof Maryland, College Park, 1995.M. Hansen, P. Anandan, K. Dana, G. van der Wal, andP. Burt. Real-time scene stabilization and mosaic construc-tion. In Proc. DARPA Image Understanding Workshop,pages 457-465, Monterey, CA, November 1994.B. K. P. Hom. Closed-form solution of absolute orienta-tion using unit quatemions. Journal of the Optical SocietyofAmerica, 4(4):629-642, April 1987.K. Kanatani. Group-Theore tical Methods in Image Un der-standing. Springer-Verlag, Berlin, Germany, 1990.P. Maybeck. Stochastic Models, Estimation and Control.Academic Press, New York, NY, 1982.C. Morimoto and R. Chellappa. Fast electronic digital imagestabilization. In Proc . International Conference on PatternRecognition, Vienna, Austria, August 1996.Y. Yao, P. Burlina, and R. Chellappa. Electronic image sta-bilization using multiple visual cues. InProc . InternationalConference on Image Processing, pages 19 1-194, Washing-ton, D.C., October 1995.G.-S .J. Young and R. Chellappa. 3D motion estimation us-ing a sequence of noisy stereo images: models, estimation,and uniqueness results. IEEE Trans. Pattern Analysis andMachine Intelligence, 12(8):735-759, August 1990.

665
http:///reader/full/[seq-4.mpglhttp:///reader/full/[seq-5.mpglhttp:///reader/full/[seq-5.mpglhttp:///reader/full/[seq-4.mpgl

06.07.fast 3d stabilization and mosaic construction

Documents