report documentation page · scientists in this group include neurophysiologists, psy- examples are...

21
UNC LASSIFIED MASTER COPY - FOR REPRODUCTION PURPOSES SECURITY CLASSIFICATION OF THIS PAGE REPORT DOCUMENTATION PAGE j rf - l. REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKING- Uncblasi fl pd 3. DiSTRIBUTION/AVAILAISILITY OF REPORT AD-A211 479 , Approved for public release; distribution unlimited. ..- - - ... ... R(S) S. MONITORING ORGANIZATION REPORT NUMBER(S) A00 __ _ _:ol.L-PH 6S. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION T-he University of Texas (If applicable) at Austin U. S. Army Research Office 6L ADDRESS (City, State, and ZIPCode) 7b. ADDRESS (City, State, and ZIP Code) Austin, Texas 78712-1084 P. 0. Box 12211 Research Triangle Park, NC 27709-2211 Ba. NAME OF FUNDING/SPONSORING T 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) U. S. Army Research Office .P/f(.03 -97"k - O? Bc ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS P. 0. Box 12211 PROGRAM PROJECT TASK WORK UNIT ELEMENT NO. NO. NO. ACCESSION NO Research Triangle Park, NC 27709-2211 11 TITLE (Include Securty Classification) On the Computation of Motion from Sequences of Images -A Review 12 PERSONAL AUTHOR(S) J. K. Aggarwal and N. Nandhakumar 13a. TYPE OF REPORT ' 13b. TIME COVERED 14. DATE OF REPORT (Year, Month, Day) jS. PAGE COUNT Reprint FROM TO 16. SUPPLEMENTARY NOTATION The view, opinions and/or findings contained in this report are those of lhe authQr($) and shyuld not be constued as an official Department of the Army position, 17. COSATI CODES 18. SUBJECT TERMS (Continue on revorse if necessary and identify by block number) FIELD GROUP SUB-GROUP sequence of images; computation of motion; optical flow; motion via features 19 ABSTRACT (Continue on revirse if necessary and identify by block number) This paper reviews recent developments in the computation of motion and structure of objects in a scene from a sequence of images. We highlight two distinct paradigms: i) the feature-based approach and ii) optical flow based approach. The comparative merits/demerits of these approaches are discussed. The current status of research in these areas is reviewed and future research directions are indicated. ' ' -. I . . ~E 20, DISTRIBUTION/AVAILABILITY OF ABSTRACT 21. ABSTRACT SECURITY CLASSIFICATION' " 0 UNCLASSFIEOAJNLIMITED C SAME AS RPT. C DTIC USERS Unclassified 22a NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Are Co) 22c. OFFICE SYMBOL Ic FOPRM 1473, 94 MAR 83 APR edition may be used until exhausted. SECURITY CLASSIFICATION OF THIS PAGE A i otlii editions are obsolete AU otUNC!_.^ SSIFIED

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

UNC LASSIFIED MASTER COPY - FOR REPRODUCTION PURPOSESSECURITY CLASSIFICATION OF THIS PAGE

REPORT DOCUMENTATION PAGE j rf -l. REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKING-

Uncblasi fl pd3. DiSTRIBUTION/AVAILAISILITY

OF REPORT

AD-A211 479 , Approved for public release;distribution unlimited.

..- - - ... ... R(S) S. MONITORING ORGANIZATION REPORT NUMBER(S)

A00 __ _ _:ol.L-PH6S. NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION

T-he University of Texas (If applicable)at Austin U. S. Army Research Office

6L ADDRESS (City, State, and ZIPCode) 7b. ADDRESS (City, State, and ZIP Code)

Austin, Texas 78712-1084 P. 0. Box 12211Research Triangle Park, NC 27709-2211

Ba. NAME OF FUNDING/SPONSORING T 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBERORGANIZATION (If applicable)U. S. Army Research Office .P/f(.03 -97"k - O?

Bc ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS

P. 0. Box 12211 PROGRAM PROJECT TASK WORK UNITELEMENT NO. NO. NO. ACCESSION NOResearch Triangle Park, NC 27709-2211

11 TITLE (Include Securty Classification)On the Computation of Motion from Sequences of Images - A Review

12 PERSONAL AUTHOR(S)J. K. Aggarwal and N. Nandhakumar

13a. TYPE OF REPORT ' 13b. TIME COVERED 14. DATE OF REPORT (Year, Month, Day) jS. PAGE COUNTReprint FROM TO

16. SUPPLEMENTARY NOTATION The view, opinions and/or findings contained in this report are thoseof lhe authQr($) and shyuld not be constued as an official Department of the Army position,

17. COSATI CODES 18. SUBJECT TERMS (Continue on revorse if necessary and identify by block number)

FIELD GROUP SUB-GROUP sequence of images; computation of motion;optical flow; motion via features

19 ABSTRACT (Continue on revirse if necessary and identify by block number)

This paper reviews recent developments in the computation of motion andstructure of objects in a scene from a sequence of images. We highlighttwo distinct paradigms: i) the feature-based approach and ii) opticalflow based approach. The comparative merits/demerits of these approachesare discussed. The current status of research in these areas is reviewedand future research directions are indicated.

' ' -. I . .~E

20, DISTRIBUTION/AVAILABILITY OF ABSTRACT 21. ABSTRACT SECURITY CLASSIFICATION'"

0 UNCLASSFIEOAJNLIMITED C SAME AS RPT. C DTIC USERS Unclassified22a NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Are Co) 22c. OFFICE SYMBOL

Ic FOPRM 1473, 94 MAR 83 APR edition may be used until exhausted. SECURITY CLASSIFICATION OF THIS PAGEA i otlii editions are obsoleteAU otUNC!_.^ SSIFIED

Page 2: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

PROC 76 8 22791

On the Computation of Motion from Sequences of

Images-A Review

4~,

J. K. Aggarwal 0r"d0-4

N. Nandhakumnar

lecessioni For

VTIS GRAk.T

DTIC TAB

I ~I i'~ycodes

IDIst j

Reprinted fromPROCEEDINGS OF THE IEEE

Vol. 76, No. 8, August 1988

Page 3: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

On the Computation of Motion fromSequences of Images-A Review

J. K. AGGARWAL, FELLOW, IEEE, AND N. NANDHAKUMAR, MEMBER, IEEE

Invited Paper

The present paper reviews recent developments in de compu- ducting research in computer vision with the objective oftation of motion and structure of objects in a scene from a developing vision systems. Vision systems with the abilitysequence u images. We highlight two distinct paradigms: i; thefeature-based approach and ii) optical flow based approach. The to navigate, recognize, and track oblects and estimate theirc-rmparative merits'dements of these approaches are discussed, speed and direction are the ultimate goals of the latterThe current status of research in these areas is reviewed and future research. The knowledge and results of research in neu-research directions are indicated. -. rophysiology and psychophysics have influenced the

design of vision systems by engineers and scientists. At the1. INiR)ODU(tION same time, results in computer vision have provided a

The ahilit, to, discei i objects, ZLel rain their motion, and framework for modeling biological vision. Such cross-fer-

navigate in three-dimensional space through the use of tilization of ideas will continue to yield better models for

vision is almost universal among animals. Incorporating biological and machine vision systems.

such vision in machines is ostensibly a straightforward task There is a long list of applications motivating a strong

given the widespread availability of microcomputers, dig- interest in sensing, interpretation, and description of

itizing cards, and solid-state cameras. Although it is fairly motion from a sequence or a collection of images. The auto

easy and inexpensive to assemble a computer vision sys- matic tracking and possible ticketing of speeding vehicles

tem, it has proved surprisingly difficult to achieve a vision on a highway is of interest to traffic engineers and law

capability in machines, even to a limited degree. This is not enforcement officers. The automatic recognition, tracking,to imply that we are not using all sorts of vision systems and and possible destruction of targets is of immense interest

motion detectors in a variety of applications. However, the to the department of defense of every country. The com-

easewith which humansdetect motion and navigate around putation, characterization, and understanding of human

objects, and the difficulty of duplicating these capabilities motion in dancing, athletics, and pilot training are impor-

in machines have recently led to major efforts by computer tant to several diverse disciplines. The analysis of scinti-

engineers and scientists to understand vision in man and graphic image sequences of the human heart is of interest

machine. These efforts are in addition to and perhaps com- in assessing motility of the heart in diagnosis and super-

plement current and earlier endeavors at understanding vision of patiets after heart surgery. Satellite imgcry pro-human vision and motion by psychologists and physiolo- vides an opportunity for interpretation and rrediction of

gists. atmospheric processes through the estimation of shape and

Broadly speaking, there are two groups of scientists motion parameters of atmospheric disturbances for the

studying vision. One group is studying human/animal vision meteorologist. The bandwidth reduction achievable

with the goal of understanding the operation of biological through the estimation of motion allows for compression

vision systems including their limitations and diversity. The of image sequences for efficient transmission. The above

scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applicationschophysicists, and physicians. The second group of sci- where the computation of motion from a sequence of

entists includes computer scientists and engineers con- images is of critical importance.This broad interest in the interpretation of motion from

a sequence of images has been evident since the first work-Manuscript received December 18,1987; revised Marc h It, 1988. shop on motion in Philadelphia in 1979 11]. Since that work-

The research was supported in part by a grant from the Army shop, several additional meetings and special issue; of va--Research Offi(e under Contract DAAL03-87-K-0089, and a grant fromthe National Science Foundation under Contiact NSF/DCR-8.517583. ious journals have contributed to the exchange of ideas and

The authors are with the Computer and Vision Research Center the dissemination of results. In addition, there have beenCollege of Engineering, The University of Texas at Austin, Austin, several sessions on motion and related issues at meetingsTX 78712, USA. such as the IEEE Computer Society Computer Vision and

IEEE Log Number 8822791. Pattern Recognition Conference and conferences of other

0018-92 19/88/0800-0917$01.00 1988 IEEE

PROCE[DINGs OF iH IEEE, VOL. 76, NO. 8, AUGUST 1988 917

Page 4: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

societies interested in vision. Thelistofworkshopsandspe- formulated based on assumptions such as rigid bodycial issues devoted exclusively to motion and time-varying motion, i.e., the 3-D distance between two features on aimagery include three special issues [21-[4[, two books [51, rigid body remains the same after object/camera motion.[6], a NATOAdvanced Study Institute[71, anACM workshop Such constraints usually result in a system of nonlinear[8], a European meeting on time-varying imagery [91, and a equations. Theobserved displacementofthe2-Dimagefea-host of survery papers [10]-[15]. The extent of the breadth tures are used to solve these equations leading ultimatelyand depth of interest is provided by the table of contents to the computation of motion parameters of objects in theof the book published to document the proceedings of the scene.NATO-ASI [16]. However, this list is incomplete at best. The The other approach is based on computing the optic flowIEEE Computer Society workshop at Kiawah Island [17] and or the two-dimensional field of instantaneous velocities ofSecond International Conference in Italy [18] are indica- brightness values (gray levels) in the image plane. Insteadtions of the broad interest in motion at this time. The recent of considering temporal changes in image brightness val-two-volume collection of papers in the reprint series [19] ues in computing the optic flow field, it is possible to alsopublished by IEEE Computer Society includes a section on consider temporal changes in values that are the result ofImage Sequence Analysis containing nine papers. The applying various local operators such as contrast, entropy,recent book edited by Martin and Aggarwal entitled Motion and spatial derivatives to the image brightness values. InUnderstanding: Robot and Human Vision [20] gives eleven eithercase, a relativelydense flow field isestimated,usuallypapers detailing recent developments in this area. at every pixel in the image. The optic flow is then used in

The above brief chronology documents the contribu- conjuction with added constraints or information regard-tions from a computer vision perspective. It is not the inten- ing the scene to compute the actual three-dimensional rel-tion of the present review to slight the earlier pioneering ative velocities between scene objects and camera.works of psychologists and other scientists. In particular, A task that is closely related to the estimation of motionthe kinetic depth effect demonstrated by Wallach and is thetask of estimation of the structureof the imaged scene.O'Connell [21] through the use of wire frame objects, and In the case of the optic flow method, this consists of group-similar effects shown by Gibson [22] in his translucent sheet ing pixels corresponding to distinct objects into separateexperiments, UlIman [23] in his rotating cylinders experi- regions, i.e.,segmentingtheoptic flowmap, and then com-ment, and Joahannson [24]-[28] are important contribu- puting the three-dimensional coordinates of surface pointstions in the area of psychophysics of motion perception. in the scence corresponding to each pixel in the image atIn the same vein, the contributions of Hubel and Wiesel [29] which [ie flow is computed. In thecaseof the feature-basedin demonstrating the existence of specialized cortical cells analysis, computing structure corresponds to formingtuned to the detection of motion are seminal contributions groups of image features for each object in the scene andin neurophysiology. The present review, howeve-, is only then computing the 3-D coordinates of each object featureaimed at the computer vision inspired contributions to the associated with each image feature.study of motion. A more balanced review of the recent con- Although structure may be computed independent oftributions in both psychophysics of vision and machine motion, e.g., via stereopsis, the former process can benefitvision is found in [20]. by the estimated motion. Knowledge of motion parameters

In this paper we do not present an exhaustive compen- for features/regions can aid segmentation of image fea-

dium of recent research in the computation of motion and tures/regions corresponding to distinct objects. In ste-structure from sequences of images; instead we list some reopsis, knowledge of object motion can facilitate estab-of the important work done and provide a flavor of the lishment of feature correspondence within a pair of stereoapproaches that have been developed, images, thus aiding the determination of structure. Image

regions with different apparent 2-D motio.is can be con-sidered to correspond to distinct objects. Psychological

research has collected enough evidence to support theThe relative motion between objects in a scene and a belief that the process of establishing correspondence and

camera, gives rise to the apparent motion of objects in a the process of estim i, structure and motion are closelysequence of images. This motion may be characterized by interwoven in the hi,- -r visual mechanism. Indeed, UII-observing the apparent motion of a discrete set of features man has shown that ap, nt motion is a clue used by theor brightness patterns in the images. The objective of the human visual system for computing scene structure [6]. Thisanalysis of a sequence of images is the derivation of the close relationship between the estimation of structure andmotion of the objects in the scene through the analysis of the estimation of motion has prompted many researchersthe motion of features or brightness patterns associated to address both tasks as a combined problem. In this paperwith objects in the sequence of images. we discuss the combined task of computing structure and

Two distinct approaches have been developed for the motion from image sequences.computation of motion from image sequences. The first of In the following sections we discuss in greater detail thethese is based on extracting a set of relatively sparse, but fundamental principles underlying the two distinct meth-highly discriminatory, two-dimensional features in the odologies forcomputing 3-D motion from apparent motion.images corresponding to three-dimensional object fea- The basic mathematical iormulations are introduced andtures in the scence, such as corners, occluding, houndaries discussed. In Section III we discuss the feature-basedof surfaces, and boundaries demarcating changes in sur- method for estimation of motion from a sequence of mon-face reflectivity. Such points, lines and/or curves are ocular images. In Section IV we discuss the optic flowextracted from each image. Inter-frame correspondence is method for sequences of monocular images. Section V dis-then established between these features. Constraints are cusses the relative merits and demerits of these two

918 PROCEEDINGS OF THf fiE, VOi. 76, NO. 8, AUGUST 1988

Page 5: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

approaches. The two approaches outlin-d above allow for The unit vector ui lies on li which is spanned by (pi, qi),the estimation of motion without requiring that scene hencestructure be known apriori. The use of stereopsis allows for uii = UPi + ,qi, where u-, +the estimation of depth, i.e., the distance from the sensor +e

to the objects. The additional information available greatly The unit vector u~i also lies on Hj which is spanned byreduces the complexity of motion estimation. The variety (p, qi), henceof ways in which stereopsis can be used to facilitate thecomputation of motion is outlined in Section VI. Finally, ui = "1,P, + 6,,q i, where y + = 1.Section VII concludes this paper with a few closing remarks. From the latter two equations we obtain

Ill. F--IURE-B-\sti Li.Io N EsiI,-A\ION FRO M '1 MONOCULAR (Xipi + j,1qi = -yupi + 6uqiIMAGE Si QtL-A I and taking the scalar product of this equation with a,b, and

In this section, :ve discuss the feature-based approach to c we get:

estimate motion from a sequence of images gathered by a a,,x,, + j3,1y., = "h'x., + 6,Jy,,single camera. A mathematical formulation is presented andvariations of this formulation are discussed. The discussion Qx,, + ,y ,, x1, +focuses on the estimation of both motion and structure. No aIx, + i3ly,, = ,,x,distinction is made between the situations where a) the These equations are linearly independent [6] and possesscamera is moving and imaged scene is stationary, b) camera the s hat are eal indepn den t a ve po -is stationary while the imaged objects are in motion, or ) t sins hat are eu in magnt e ve opoboth camera and imaged objects are in motion. What is sie sin C h e ofstes soluions, the1vctorscomputed is the relative position and motion between the aredetermnd lheditnu u - u2ae npd2 Whnncamera and the imaged scene. In the following liscussion U21 J, and d3 u - are then computed. When noit is assumed that image features, such as points and lines, two vectors u1 are equal, then di # 0 and a unique triangle

with sides dl, d2, and d3 is specified. Consider the tetra-have been extracted from each image and inter-frame hedron formed by this triangle and the origin 0, with thefeatures vertices of the triangle being placed at unit distance from

the origin 0. From the projections of A, B, and C on theWe presencblw threqpproaches. Toe featurese three planes (images) a unique 3-D configuration is easilyanalysis of monocular image sequences. The first of these coptdInheegerecaie.wntoofheu

is the direct formulation in which rigid body motion is computed. In the degenerate case, i.e. when two of the u iassumed. In this formulation the rigidity constraint is man- are identical, straightforward trigonometric considerations

ifest in there being single rotation and translation matrices provide recovery of the structure and motion of the bodyforall observables. In the second approach rigidity is explic- [23.

itly invoked with the formulation being based on preserv- Although the parallel projection model is adequate in

ing rigidity, e.g., preserving the angle between two inter- some situations it is not appropriate for most real-world

secting 3-D lines lying on a rigid object. These two schemes applications which mandate the use of perspective pro-

use two or three views to estimate structure and motion. jection. Theuseof perspective transformation substantiallyA third approach consists of using a long sequence of mon- increases the complexity of the problem. Roach and Aggar-ocular images. A brief description of the salient featurms of wal [30], [31] were among the first to compute structure andeach approach is presentedi motion from images via the perspective imaging transfor-

mation. A scenario consisting of a static scene and a movingA. Direct Formulations camera was assumed. The goal was to investigate whether

it would be possible to determine the position of the pointsAn orthographic imaging model was used by Ullman [6], in space and the movement (translation and rotation) of the

[23] to estimate the structure and motion of an object camera.undergoing rigid motion. The position and motion of four The equations that relate the three-dimensional coor-noncoplanar points in space were recovered from three dis- dinates of a point (X, Y, Z) and its image plane coordinatestinct orthographic projections of these points. The for- (x, y) aremulation is as follows. Let 0, A, B, and C be the four points.The orthographic projection of these points in three dis- x = Ftinct planes H., 112, [13 are given and the 3-D configuration a (X - X,) + a1,(Y - Yo) + a, Z - Z,)of these points is to be determined. A fixed coordinate sys- a, (X - X,,) + a2Y - Y,0 + a, V - Z,,)tem with origin at 0 is chosen. Let a, b, c, be the vectors from y = F -a.(X X) + a:(Y - Y.) + a, 1(Z - Z.)o to A, B and C, respectively. Let each image have a coor- ai(X - 4) + a,(Y Y) + a,(Z - Z,)"

dinate system with its origin at the projection of 0, and its Here F is the focal length, (X, Y, Z,) is the projection cen-axes along the directionspi, q,. Note thatpiand qiare orthog- ter and a, , * • , a,, are functions of (0, 4, 4'), the ori-onal unit vectors on Ili. Let the image coordinates of (A, B, entation of the camera with respect to the global referenceC) on li be (x,, y,,, Xb,Yh,, x ,y,,), and let ui be the unit vector system.along the intersection of Hi and Ill. Roach and Aggarwal showed that five points in two views

The image coordinates are given by the dot products are needed to recover these parameters (30], [31]. Theyrelated the number of points and the number of equations

A, api, y,,, = a'q, xh, = b)p, available for the solution of 3-D coordinates and motion

v, = b'qj, x,, =c-p, y, c q,. paia,i ei , ,s follnw\.: The g 4i 0rdinat' t eih poit

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION Of IMACAs qi9

Page 6: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

are unknown so the five points produce 15 variables. The planar surface patch AX + BY + CZ = 1, where A, B, andcamera position and orientation parameters (X, Yo, Z1, 0, C are the structure parameters. The mapping from theCand *,) in two views contribute another 12variablesvield- (x, Y) space to the (x, y') space (from one image to the nexting a total ot 27 variables. Fach 3-D point produ( es two pro- image) is given bylection equations per camera position thus forming a total aix+ay+a, ' +ary+a,ot 20 nonlinear equations. To make the number of equa- x- y I

tions equal the number of unknowns, seven variables must a-X + ajy + 1 a.x + aty + 1

be known or specitied a priori. This is achieved by (hoosing where, a, through a, are the eight "pure parameters" andthe si\ camera parameters ot the first view to be zero and can be expressed in terms of the focal length, the structuresetting the Z-component of one of the five points to an arbi- parameters (A, B, C), and the motion parameters Nx, N, Nz,trarv positive constant to fix the scaling factor. The reason 0, T , T) and T(N spe-ifies the rotation axis, 0 is the rota-tor tixing one variable as the scaling constant is that under tional angle, and T is the translational vector). For a partic-the given camera/object constraints the information ular set of pure parameters, the above equation representsembedded in every image sequence is inherently insuffi- a mapping from (x, y) space to lx', y') space. A set of linearcient tor determining the correct scale. For example, the equations is solved fu. these eight pure parameters.observed projected motion of an object moving in space After the eight pure parameters are obtained, the struc-can be reproduced by another object which is twice as large, ture and motion parameters can be determined. Here, thetwice far away from the camera, translating twice as fast, Z component of the translation vector is arbitrarily chosenand rotating with thesame speed around an axisof the same to fix the scale. After a series of manipulations, it .: possibl_orientation as the former object. In general, the informa- to get a sixth-order polynomial equation in terms of onlytion of the absolute distance of the object from the viewer one of the variables T', = T\/T/. T' is solved first and thenis usually lost in the image formation process. Therefore, all the remaining structure and motion parameters can bearbitrarily setting the scale is not unreasonable in finding easily obtained. Although potentially six real roots maythe solution for the structure and motion parameters. result from solving a sixth-order polynomial, the authors

An iterative finite difference Levenberg-Marquardt algo- reported that aside from a scale factor for the translationrithm was used to solve these 18 nonlinear equations (after parameters, the number of real solutions never exceededfixingthe scale factortwoof the20 nonlinearequations have two in their simulation.no unknown variables in them). For noise-free simulations, Later, Tsai and Huang [36] investigated the problem of athe methods typically converged to the correct answer curved surface patch in motion. Two main results werewithin 15 seconds on a Cvber 170/50 and hence are rea- established concerning the existence and uniqueness ofsonablv efficient. If noise is introduced into the point posi- the solutions. An E matrix was specified as E = TR, wheretions in the image plane, a considerably overdetermined T is the translation and R is the rotation. Given the imagesystem of equations is needed to attain good accuracy of correspondences of eight object points in general posi-the results. Two views of 12 or even 15 points, or three views tions, the E matrix can be determined uniquely by solvingof seven or eight points are usuall\ needed in the noisy eight linear equations. Furthermore, the actual 3-D motioncases. parameters can be determined uniquely given E, and can

Unlike Roach and Aggarwal [30], [31] who solved the be computed by taking the singular value decompositionmotion parameters through a single system of equations of Ewithout having to solve nonlinear equations. Detailedthus creating a large search space, Nagel [32] proposed a proofs of these claims are presented by the authors [36].technique which reduces the dimension of the search space Although the approach results in the solution of a sct ofthrough the elimination of unknown variables ine impor- linear equations, the system is highly sensitive to noise andtant observation made by Nagel was that the translation especially to perturbations of image coordinates. Longuet-vector can be eliminated and the rotiion matrix can be Higgins [37], [38] worked independently to obtain results

solved separately. A rotation matrix is completely specified similar to those described above. He derived the E matrixby three parameters-namely the orientation of the rota- and presented a method to recover R and T from E usingtion axis and the rotation angle around this axis. It is shown tensor and vector analysis.that if measurements of five points in two views are avail- Extensions of the above approaches were proposed byable, then three equations can bewritten and the three rota- several researchers [39]-[43]. One limitation of thetion parameters can be solved for separately from the trans- approaches developed by Tsai and Huang [36] and Longuet-lation parameters. The distance of the configuration of Higgins [37] is the requirement of a priori knowledgepoints from the viewer is arbitrarily fixed and the translation regarding nonzero translation. Zhuang and Haralick [39]-vector can then be determined. [41] have developed an algorithm which overcomes this lim-

Tsai and Huang [33]-[35] proposed a method to find the itation. Zhuang and Haralick do require that the observedmotion ofa planar surface patch from 2-D perspective views, object points do not lie on a specific quadratic surface pass-The algorithms consists of two steps: First, a set of eight ing through the origin. Faugeras, Lustman and Toscani [42]"pure parameters" is defined. These parameters can be and Nagel [43] reformulated the problem in more robustdetermined uniquely from two successive image frames by manners as least-mean-squared error minimization prob-solving a set of linear equations. Then, the actual motion lems.parameters are determined from these eight "pure param- The above approaches used 3-D points and their projec-eters" by solving a sixth-order polynomial. tions on the image planes as observables in formulating the

By exploiting the constraints of projective geometry and problem. An alternative approach is to use 3-D lines andrigid motion, equations can be written to relate the coor- their projections as observables. When lines are used asdinates of image points in the two frames for points on a features, two views are no longer sufficient and a minimum

920 PROCEEDINGS Of THE ItEE, VOL. 76, NO. 8, AUGUST 1988

I A

Page 7: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

of three views are required. This is due to the fact that 3-D point projecting to the image origin, these circles projectlines possess an additional degree of freedom when com- into ellipses. The structure of the rigid object can bepared to 3-D points. In other words, one can slide a 3-D line recovered to within a reflection by finding the equationsalong itself and obtain the same line. We presen below an describing the ellipses. Furthermore, it is shown that theoverview of some techniques that use lines as features in lengthsofthelongand short axesofan ellipsearefunctionsthe estimation of structure and motion, of the position of the point in space. The position of each

Yen and Huang [441, [45] have proposed an iterative pointinspace(uptoareflectionabouttheimageplane)canmethod 5ased on spherical projection and on the obser- then be recovered provided that the fixed axis o; rotationvation of seven line correspondences in three views for the is not parallel or perpendicular to the image plane.case of general motion between views. Liu and I luang [46], A jointed object is an object made, up of a number of rigid[47] have used line correspondences in formulations anal- parts which cannot bend or twist. If the jointed object stillogous to the methods outlined above. They decompose moves in a way such that the fixed-axis assumption holdsrigid body motion into first a rotation around an axis for each rigid part, then the motion and structure of thethrough the origin and then a translation. For the case of jointed object can be recovered. It is assumed that the rigidpure rotation, two line correspondences over two frames parts are connected by joints identified since they satisfyare sufficient to determine the rotation matrix. The result- two sets of motion constraints. If the joints are not visible,ing nunlinear equations are solved iteratively. For the case they can be found by solving a system of linear equations.of pure translation, five line correspondences over three The joints can then be used to eliminate some reflectionstrames produce a system of linear equations which can be and thus the number of possible interpretations of struc-solved to determine the translat:on. For the general case, ture is reduced. Finally, the 3-D motion of each object isLiu and Huang use six line correspondences in three frames. reconstructed.The iotation matrix is first determined and then the trans-lation matrix is computed. Simulations of the iterative algo- B. Explicit Use of Rigidityrithm on synthesized data show that the approach is highlysensitive to noise and initial estimates. Moreover, esti- The assumption of rigid body was implicitly used in themation of the translation vector is very sensitive to errors above formulations. Weoutline below a typical formulationin estimation of rotation. The algorithm has not been tested in which the constraint of rigid body motion is explicitlyon real data. invoked [49]. We discuss the case where five points in two

A more robust formulation of motion estimation using views are used as the observables. As in the above discus-

line correspondences, which incorporates the effect of sion, the relative positions of the cameras are unknown,noise, is due to Faugeras, Lustman and Torscani [42]. An and the correspondence between points in the two viewsextended Kalman filtering approach is followed in solving is assumed known.the nonlinear equations for a"best"estimateof the motion The two central projection imaging systems are shownparameters. The "best" estimate is defined to be one that in Fig. 1. C and C, are the centers of projection and I andminimizes an expression that involves the measurables, theunknowns, and partial derivatives of the nonlinear equa-tion that relates the unknowns to the measurables. Themeasurables for each 3-D line consist of three vectors, onefor each of the three image planes. Each vector correspondsto the unit normal ot the plane containing the projectionof the 3-D line and the center of projection for that image o1plane. The unknowns consist of the rotation parameters xthat relate the positions of the three image planes. After s1 s, -solving for the rotation, the translation is computed via lin- 0

ear equations. The structure of the object can then be com- Zputed via either a least-squares technique or via the Kalmanfiltering approach. Significant improvement was reported Fig. 1. Imaging geometry tor the two views. P, is the 3-Din sensitivity to noise and initial estimates. point, p, and q, are the images ot Pon the two image planes.

Implicit in the above discussion was the assumption thatthe scene contained a single rigid object. Feature-basedmotion analysis has also been applied to scenes containing 12 are the image planes. A point P, in space with coordinatesmultiple rigid and jointed objects. Webb and Aggarwal [481 X, Y,, Z,,) in S and (U,, V,, W) in S2 is imaged asp, on 1 andhave presented a method for recovering the 3-D structure q, on /2. The objective of the analysis is to derive the struc-of such s(enes under orthographic projection. The fixed- tureof the points and the transfori~ation between the coor-axis assumption is adopted to interpret images of moving dinate systems, given the image coordinates oftheobservedobjects. The fixed-axis assumption asserts that every rigid points in the two imaging ( oordinate systems.object movement consists of a translation plus a rotation Because P, is on line C,p, (refer to Fig. 1), there exists a realabout an axis which is fixed in direction for a short period number X, > 1 such thatof time. It is shown that, under the fixed-axis assumption, X, = Y, Z, Fselecting any point on a rigid moving object as the originof a coordinate system causes the other points to trace out where (x,, y,) are the coordinates of p, in the /,-image coor-circles in planes normal to the fixed-axis within that coor- dinate system, and f, is the distance from C, to the imagedinate system. Under parallel projection, with the selected plane. Similarly, P, is on line C.,q, and if (u,, v) are the coor-

AGGARWAL AND NANt)HAKUMAR: COMPUTATION OF MOTION OF IMAGES 921

Page 8: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

dinates of q, in the I, coordinate system then there exists then one can show that-% > 1 such that cos 0 = (a , + a5 4- a, - 1)/2; sin 0 = (a. - a8)12n,

U; = -Y =I,, VV, = (1 n, = /(a, - cos 0)/(1 - cos 0)

The squared distance between points P, and P expressed n, = (a, + a4) (1 - cos 0)2n,in S, is therefore

d"(SI) = (X, - X)2 + (Y, - Y,)2 + (Z, -Z n,=(a1+a 7 )(1 - cos0)12n.

The algorithm has been shown to perform well on bothor real and synthetic data, and these results are presented in

d (Sl = (X,x, - Xx,) ( Xy y,)" + ( , -[49]. The use of lines as observables in an approach similar to

Similarly, the squared distance between P and Pexpressed theoneoutlined above hasalso been attempted byMitiche,

in S, is Seida and Aggarwal [50] who used the principle of angularinvariance between 3-D lines on a rigid body undergoing

d (S,) = (ru, - "1 u,)2

+ ( v, - - (', - y,)2 f. motion. In their method the orientation of lines is firstrecovered, then the rotational component is computed, and

Now, the principle of conservation of distance allows us to finally, the translation is recovered. The observation of fourwrite (assuming, of course, identical units of measurement lines in three views allows for the determination of struc-in S, and So2 ture and motion parameters.

The use of line correspondences has the advantage overd,(S,) = dS. the use of point correspondences in that extraction of lines

or in images is less sensitive to noise than extraction of points.Also, it is easier to match line segments between images

(X,x, - Xx)" +i (Xvy, - \y)-2 + IX, - Xl)f 2 than it is to match points.It is possible to use both lines and points concomitantly

= (-t,u, - ")uY ' + ("v, - " + (, - -',)"f . (3.1) in formulating the task. In the case of combined point and

It may be seen that each point P, contributes two line correspondences, four points and a line in two views

unknowns, X, and -,, and each pair of points (P,, P,) gives are sufficient to compute the structure of the scene as well

one second order equation (3.1). Therefore, 5 points yield as the displacement between views as described by Aggar-

10 equations and 10 unknowns. Again, fixing the scale arbi- wal and Wang [51].

trarily, we end up with a system of 10 equations in 9 The following observations may be made based on the

unknowns. Note that each equation involves only 4 of the current literature:

unknowns. Since distances between points define struc- 1) Using points or lines, or combination of points andture only up to a reflection in space, the solution of system lines for the computation of structure and motion(3.1) based on these distances is also subject to this uncer- usually gives rise to nonlinear equations.tainty. System (3.1), although simple, is nevertheless non- 2) The computation based upon minimum number oflinear. Experimental results using existing iterative numer- point, or lines is usually more sensitive to noise per-ical methods do indicate, however, that the solution is well turbations.behaved [49]. 3) In general, alternate formulations may give rise to dif-

When the position of the points has been computed, ferent sufficiency conditions regarding minimumdetermining the relative position of the cameras becomes numberof pointsand lines required for solving struc-a simple matter. Indeed, take 4 noncoplanar points (from ture and motion.the 5 observed points in space) and call A, and A2 the matri-ces of homogeneous coordinates of these in 5, and S2, C. Using Extended Sequences of Monocular Imagesrespectively. Then if M is the transformation matrix (inhomogeneous coordinate form) that takes Sr onto S we The approaches outlined above attempt to recover struc-

have ture and motion from a limited number of views of thescene, typically 3 or 4 views. We discuss below some tech-

A, = AIM. (3.2) niques that use long sequences of monocular images torecover structure and motion.

Since the 4 points are not coplanar, (3.2) can be solved The first of these is the incremental approach whichfor M. Now if we decompose motion M into i) a rotation allows for deviations from rigid body motion. This differsthrough angle O about anaxisthrough theoriginwithdirec- from the approaches outlined above which assumed thattion cosines n1, n2, n1, followed by ii) a translation (t,, t), ti) the object being imaged undergoes rigid body motion. Psy-and if it is written as chophysical studies have shown that the human visual sys-1 temcancopewith lessthan strict rigidity [52],[26],[27]. These

a, a2 a! 0 studies prompted UlIman to devise an algorithm that

a4 aj a. 0 recovers the 3-D structure of viewed objects in an incre-

M =mental manner using several views of an object in motiona7 a aq 0 [52]. The performance of the algorithm is argued to be com-

t' t' t' 1 parable to that of the human visual system because it pos-sesses the following characteristics [52]:

922 PROCEDINGS OF THE IEEE, VO[. 76, NO. 8, AUGUST 1988

Page 9: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

1) At each instant there exists an estimate of the 3-D Kalman filter is employed for recursive estimation of thestructure of the viewed object. The internal model motion parameters. The object is assumed to be transpar-M(t) of the viewed structure at time t may be initially ent so that feature points are always visible and corre-crudeand inaccurate,and may be influenced by static spondence is assumed to have been established a priori.sources of 3-D information. The unknown model parameters are represented as a vec-

2) 1 he recovery process prefers rigid transformations. tor:3) It is able to integrate information from an extended [xc xc zc zc pl p2 w]7

viewing period.4) The recovery process tolerates deviations from ri- where, (xc, zc) is the location of the center of mass of the

gidity. object, (xc, zc) is the object translational motion, pi and p25) It eventuallv reccvcrs the correct 3-D structure, or a are unknown phase angles of moment arms ri and :2 that

close approximation to it. connect the two feature points to the center of mass. Hererl and r2/rl is ass' 'med known. The differential equation

A parallel projection system is used. M(t) consists of a set describing unforced motion is written in terms of the aboveof 3-D coordinates (X,, Y,, Z,) where (X,, Z,) are the observed vector as:image plane coordinates of a point and Y, is the depth. Theestimation of structure therefore consists of finding Y,. An X(t) = [xc 0 xc 0 w w 0]initial set of values is chosen for the Y,. Consider the sit- with arbitrary initial conditions xc(t), zc(t), pl(t), and p2(t).uation at time t'. Let (x,, y,, z,) be the new structure of the This system yields the following state equation:corresponding points. The task is to find y, while minimiz-ing deviations from rigidity. The deviation from rigidity is x(k + 1) = F(k) x(k)defined as follows. Let L, denote the distance between wherepoints i and at time t. Let L, denote the distance betweenpoints i and / at time t'. Under rigid motion L, should be xIk) = [xc(k) xc(k) zclk) zc k) pl(k) p2(k) w(k)]' andequal to L,,. The deviation in rigidity is expressed as -1 7 0 0 0 0 0

E D, where D,-= L) 0 1 0 0 0 0 0E= D, whr D,- L, 0 0 1 T 0 0 0

and the summation is for all i,/. F(k) = 0 0 0 1 0 0 0Two modifications to the basic scheme were explored

[52]. These included using different metrics for measuring 0 0 0 0 1 0 7the deviation from rigidity and allowing for a correction inthe initial model M(t). Simulations using synthetic datawereconducted. Results indicate that tij. model does arrive at 0 0 0 0 0 0 1a good approximation to the 3-D structure after severalviews, but does not converge to the exact solution. Also, Here, m is the time interval between successive images. Thethe solution is unique upto a mirror reflection. The mod- measurement model is given byification involving aflexiblemodelquicklyarrivedatagood X1 = L[xc + rl cos (pl)]l[zc + rl sin (pl)] = hl[x(k)]approximation with a few views but with additional viewsthe estimated structure oscillated about the correct solu-tion. An analysis of the convergence properties of this algo- where X1 and X2 are the images of the two feature pointsrithm has also been carried out by Hildreth and Grzywacz and L is the focal length of the sensor. The vector repre-[53]. They have also suggested a continuous formulation of sentation is given bythe above approach wherein instantaneous velocities of the X(k) [X1(k) X2(k)]= h[x(k)] + n(k)points are used instead of point positions.

Although it is argued that such a formulation is warranted where h[x] = [hl (x) h2(x)] and n(k) is the term correspondingwhen arbitrarily close frames are used, the results of Hil- to zero mean, Gaussian, spatially correlated, and tempo-dreth and Grzywacz indicate that local velocity information rally white noise.is insufficient to solve the problem, even when the object The above formulation is then used to design an iteratedis viewed over an extended period. The major limitation of extended linear Kalman filter that solves for the state vari-the incremental approach discussed above is that it per- ables-in this case the translation and rotation parameters.forms well only when objects rotate about a fixed axis. In The performance of the algorithms on Monte Carlo sim-addition, orthographic projection is not generally valid. The ulations are discussed in [54], while extensions of thisapproach does however illustratethe importanceof motion approach are presented in [551.in the perception of structure. Weng, Huang and Ahuja [56] have proposed a method of

BroidaandChellappa[54]considerthecaseofarigidbody characterizing rigid body motion from long monocularundergoing constant translational and rotational motion. image sequences, i.e., over extended viewing periods. TheirThis assumption allows for a formulation in which the num- approach involves first extracting structure and motionber of unknown model parameters does not increase with parameters with two views of 8 points [33]-[36] and thenthe increase in the number of image frames. A two-dimen- computing the trajectory of the rotation center which is thesional object undergoing one-dimensional motion is center of mass or some fixed point of the object. Theyassumed. They also assume that the object structure is assume that the angular momentum of the object is locallyknown and attempt to recover the motion parameters. A constant and the object possesses an axis of symmetry. They

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION Of IMAGES 923

Page 10: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

argue that if motion is smooth and the time interval covered Fang and Huang [59] have presented experimental resultsby the model is relatively short, then the trajectory of the of motion parameter estimation using a modified versionrotation center can be approximated by a polynomial. The ot an algorithm initially developed by Ranade and Rosen-developed model is applied to subsequences of images to feld [60]. The relaxation algorithm is modified by incor-estimate the trajectory and predict the new locations of porating different scales to allow for large scale change, inobject points. I he main characteristic of interest is the exis- the images (due to large translations in depth). Anothertence of precessional motion and the parameters thereof, relaxation technique for establishing correspondence isA least-squares method is adopted to compdte the param- due to Kim and Aggarwal [61]-[6-] who have applied their,ters. The authors present a detailed analsis of the rela- technique to matching features in stereo imagery as welltionship between the parameter,; of precessional motion as for matching 3-D features in depth maps. Barnard andand discrete two-view motion. The simulations discussed, Thompson [64) have proposed an iterative relaxation label-however, deal only with 3-D point sets and no testing has ing technique for matching features in stereo imagery basedbeen conducted using real data extracted from nonocular on smoothness in change ot depth. This method may beimage sequences. applied to matching features in two monocular images

based on smoothness in spatial displacement of image lea-

D. The Correspondence Problem tures. Prager and Arbib 165] des(ribe a technique similar to

In the above discussions it is repeatedly assumed that Barnard and Thompson but have in(luded an additional

correspondence %%as available between features extrac ted temporal constraint on feature displa ements Many otherapproaches to matching image features (an be tound in

from one image in a sequence of images and those emtra( ted recent literature, for example see [661-[681.from the next image. The task of establishing adIn this sectin we discussed the teaure-based -xtractiontaining such correspondence is, however, nontrivial. The of motion from mono(ular image sequenes. It \,asambiguity is aggravated bv the effects ot occlusion whir h assumed that image features, such as points and lines, hadcause features to appear or disappear and also give rise to been extrar ted trom each imae and inter-trame corre-

"false" features. The development of robust techniques to spondence had been established between them. Three

solve the correspondence problem is an active area ot apoadece he be ered discssed the. refo

research that is still in its infancy. We present a brief approahes to the problem were discussed: the dire I fir-

description of a few of the appro uaches developed. The formulation in which rigidity is expli itly invoked,

problem of finding correspondence is common to other athrduappoa ing log seneo monokula

areas of computer vision such as stereoscopy and optic flow. images.

Some of the techniques developed for solving the corre-

spondence problem in these other areas can be applied to IV. Ofm'i Fion% B-\fi Moi -, Ei to..,imii .

the feature-based analysis of monocular images as well, andvice versa. In this section we present approaches in which the

Aggarwal et al. [571 have classified correspondence pro- instantaneous changes in brightness values in the imagecesses into two categories: those that are based on iconic areanalyzedtoyield adensevelo(ity map called image flowmodels and those that are based on structural models. The or optic flow. The three-dimensional motion and structureformer class consist of templates extracted from the first parameters are then coml-uted based on various assump-frame which are then detected in the second and subse- tions and/or additional information. No correspondencequent frames. The second approach consists of extracting between features in successive images is required.The optictokens with a number of attributes from the first image, and flow techniques rely on local spatial and temporal deriv-using domain constraints and structural models to match atives of image brightness values. This approach, as will bethese tokens with those extracted from the second and sub- evident from the following discussion, is distinct from thesequent images. The latter approach is computationally feature-based analysis of monocular image sequences dis-

more expensive but also more robust than the former, cussed in the previous section where 1) a relatively sparseSethi and Jain [58] descrihe a method for finding corre- set of two-dimensional features is extracted from the

spondence and maintaining correspondence between fea- images, 2) inter-frame correspondence is establishedture points extracted from a long sequence of monocular between these features, 3)constraints are formulated basedimages. They present algorithms based on preserving the on assumptions such as rigid body motion, and 4) thesmoothness of velocity changes. The iterative optimization oLerved displacement of the 2-D image features are usedalgorithmssearchforanoptimumsetoftrajectoriesforfea- to solve these equations to produce 3-D structure andture points in a sequence of images based on constraints motion estimates.on the direction and magnitude of change in motion. A The relative motion of a scene with respect to the viewerhypothesize and test approach is also proposed to handle gives rise to a distribution of velocities on the image plane.occlusion. This method hypothesizes occlusion if the num- This phenomenon manifests itself as temporal change in

ber of feature points detected in a frame is less than that brightness values (gray levels) in the image plane. The imagedetected in two or more preceeding or succeeding frames. velocities are, in general, functions of the motion of viewedInterpolating the missing point position using the pre- objects relative to the camera, objects' locations in 3-Dceeding two frames and testing this with the subsequent space, and 3-D structure of the objects. The recovery of thetwo frames verifies the existence of occlusion. Experiments 3-D motion and structure information from the sequencewith manually extracted features illustrate that the approach of monocular images can be decomposed into two steps:is able to deal with limited occlusion. The problem of auto- 1) compute image plane velocities from changes in imagemated extraction of features, however, has not been intensity values, and 2) use optic flow to compute 3-Daddressed by the authors. motion and structure. We discuss below some basic for-

924 PROCEEDINGS OF THE IEEE, VOL. 76, NO. 8, AUGUST 1988

Page 11: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

mulation; of these two problems and outline the salient Equation (4.6) may be solved iteratively, i.e., obtain u(t), v(t)features io solutions to these two tasks. using u3,,,t - 1), va,,(t - 1).

Horn and Schunck show that the iterative method con-A. Computing Optic Flow verges when the optic flow is static i.e., when the velocity

Let g(, Y, 1) be the image intensity at point (.x, y) in the vectors do not change with time, e.g., a sphere rotating

image at time t. With the assumption that the intensity is aboutastationaryaxis. When thiscondition isviolated, e.g.,

tho samie at time t -.- it for the point Ax - A, .1 v) ot the when an object translates in front of a stationary back-

image, we have , ground, there exist boundaries where local smoothness ofoptic flow will not hold. If the boundaries can be detected

go As\, As, t t = gAt , ) . t) (4.1) thenthetechniquemaybelimitedtosmoothregions. Somewhere At. A,, and -\ are small. Appro\imating the lett-hand techniques for determining such boundaries are discussed

side h, a Ta,,or se-ies by Schunck [731.The first-order approximation of (4.2) is unsatisfactory for

'4, ,. t -It - g ,,, '. to - g, A edgesandcornersintheimage[741. First-and second-orderderivatives of the Taylor series expansion of (4.2) were used

,i,'. AtV higher order terms. (4.2) by Snyder et al. [75] who obtained a single nonlinear equa-Ignoring the higher order terms in 14.2), using (4.1) in (4.2) tion in the two unknowns u and v. Prazdny [76] used theand taking the Iim!t as -it -- 0 approach suggested by Snyder et al. [75] to solve the prob-

lem ,where only pure translation of the sensor was involved.,u - g, g. - 0.4 Prazdny further assumes that the Focus of Expansion (FOE)

In thi equation, the partial derivatives g,, g.. and g, are o, image flow is assumed known and then solves for thee,timated trom the image. u - dUt and d( tare the magnitude of the image flow.

eloc it omponents in thedirections and ,respe(tivel,,. Yachida [77] extended Horn and Schum ks iterativea,,,,o ated sith the point i\. i). The( ollection ot such veloc- method discussed above [69] for computing optic flow. Theits \.e( tors tor the entire image (onstitutes the optic flow smoothness constraint (onsidered not only a spatial neigh-ter the image borhood within the frame but also a temporal neighbor-

Equation t4. 1i embodies two unknowns u and %, and is hood, i.e., areas in the pre(eeding and succeeding frames.not suthti ent by itselt to spe( if, the optical tlo,, uniquely. In order to devise additional constraints to solve theIt doe,, c o)ntrain the solution. It is possible to compute op'.- image flow equation (4.3) Nagel [74], [78] has posed specific(al flow tor image, using the optical flow constraint eq,,a- conditions on local gray value distributions and has pre-tio(n together with additional assumptions. Popular sented an operator (gray value corner detector) that detectsassumptions include one o the following: locations in the image that satisfy these conditions. He

develops the Taylor series of (4.2) up to second-order terms.a) optical Iloss is smooth and neighboring points have Minimizing an error functional results in a system of two

similar velocities, nonlinear equations in u and v. These yield a closed formb optical flow i,, onstant over an entire segment of the solution fot the optic flow at the image locations detected

image, by the corner detector. Nagel and Enkelmann [79] use theseoptical flow is the result of restricted motion, for values as initial estimates in an iterative algorithm thatexample, planar motion. extends the solution of the nonlinear system of equations

One such constraint is the smoothness constraint, i.e., into image areas surrounding the gray value corner. Nagelmotion field varies smoothly in most parts of the image [69]- [80] has also proposed a modification of Horn and Schunck's[72]. Horn and Schunc k [691 imposed this constraint by min- smoothness criterion to take into consideration occludingimizing the error in optic flow expressed as: edges Nagel introduced a weight matrix which depends onE,(x, v) = (error in i4.3)) + X (deviation from smoothness) gray level changes in such a way that smoothness require-

ment is retained only for the optical flow component which= (g,u g )- _ , (2 u2 + u + (+ + v-)[ is perpendicular to strong gray value transitions.

Haralick and Lee [81] use (4.3) in conjunction with therequirement that the first derivatives of the gray value struc-

where X is a constant. The task is to find u and v so as to ture that has been displaced in the image due to objectminimize R in the following motion must remain the same. This vi.ds three additional

equations:

R = Ifg,u + gv + g,2 g,u + g,,v + g,, = 0

4 Xslfu- + u -) + v ][ dx dy. (4.5) g ,u + gv + g,, = 0

The integral equation may be solved by methods of cal-culus of variation. Differentiating (4.5) with respect to uand g1,u + g,,v + g, = 0. (4.7)v and equating aR/au and aR/v to zero (for minimum error Equations (4.7) and (4.3) form an overdetermined systemR2, and writing =u - u) u - u.. and (v + v,) = v - v of four linear equations in u and v. Tretiak and Pastor [82]

we get the following:- gx P/D, v = v, - gP/D (4.6) 'The Focus of Expansion (FOE) is defined as the intersection of

u = U theaxisofcameratranslationwiththeimageplane, whentheinter-

where section occurs on the positive half of the axis. When this inter-section lies on the negative half of the axis of translation, it is termed

P = (,u.uae + gi ,V, + gd, and D = X' + g+ g'Y. the Focus of Contraction (FOC).

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION OF IMAGES 92S

Page 12: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

also independently arrived at a similar tormulation. The above schemes ot Horn and Schunk [69], Haralick and Leesolution ot thesStem of equations isette ted hv thepseu- [81], fretiak and Pastor J821, Nagel [80], and Hildreth [83]doinserse tornialism [78], 182]. using a mathematii.al formalism developed by him and has

Hildreth [83] has le,,eloped a s heine tor (roneputing shown the ielationship between these approaches.%mg veloitv \( tor aIong (oitour,, ornied by detec(ting The above approa( hes deal with images at a single scale

zero-(rt)ssi ng, ot the Laplat ian It Gau'sian l.O(G) filtered ot resolution, i.e., the tinest resolution available from thei mage t)7]. This approach is based on Marr , theory that inaging sensor. Several hierar( hical schemes have beeninitial mtion measurement . b% the hLn-aIl u .aIsual systen developed [841-[8,7]. 1nkehmann [84] (reates a Guassian low-are made Ol, at I(it ation,s Ot sigfitic ant intensit Changes. pass pvram id for eat h image. Processing begins at a coarseThe ts,.n-dimensional \eloo. t tield along the (ontour is level wherein the initial fi,,pla (ement ve( tors ai e set to zero.des, rihed h\ the ve( tor fiunetion V(,sI., here s denotes Ihese \ec tors are projectedto tiner levels via bi-linear inter-ar length. V(s an i ldet onipoed into ((ilpoents I s, polation. Within each level, the velo( itv field is computedand ,si that are perpenditularand tanential, respe(ti\el,, via Nagel's approach [801 which embodies the orientedto the iilntoLil. smoothness criterion. A tinite ditteren e anproa,_h yields

V-i I l ,sI I ',u fs, (4.8) a large sparse system ot linear equations which is solvedusing a multi-resolution relaxation approach. Glazer's

v here u i and u si art' unit Ve tor, in the dire( ti ins per- approach [85] uses Horn and S hunk's i riteria [69]. Glazerperif11 ular and tani.tntial t) the ( u rse. \n orthographi( uses a Gaussian p ,rantid ,.,ith quad-tree (onne(tivit, toprot ' t in geonietrs is ued. sollt-ion-, to (4.8) t(ir the sima- propagate velo( it\ vecto( r,, triim (oarse ti ire levels. (Glazerpl. , a-( e )t ( (,n.start %eh) it\ and rti,,id tioti on in image uses a tinite diftelen(e approa(h and a (omplex muhi-levelpl,1 are di',, u-,ed. Ihe appl l( .1iiM Wit a tore general (on- relaxation approach which involks es ds nanic sw"it(hing-traint l, th .n c1-,( u-.-ed. i.e., the a1,sufption that velic it; between levels. Anandan [871 uses a Lapla(ian pvramid

mw -rt, )thk ,it 0 he (i()it ur. Toi It'a'shrt total vari- \.hich provides a set of bandpass titters (as opposed to theit. ,n .Tn the 'elhh It\ tiel!d tht tollosing (Wiitiliuos tun(- low-pass tilters provided bv Gaussian pyramids. A coarset pod p i )-'t to tine ( ontrol ,trategy is also employed via an "overlapped

prolection scheme" that allows, for multiple choices in theS- r ds. (4.9 propagation of velocity \e(tors. Anandan's technique isis,, based on establishing matches between image events :n

Thi-t,, ornired v, ith the ( (instraint that the perpendicular successive frames. The mat(h criterion used is the min-( wimpimnnt o th, (mputed \eloc it, tield V • u" must be imization of a Gaussian weighted sum-ot-squared-difer-h h}-.' t 1 the mea,,Lire~d p,,rpendic Kuar c onponent 1 to form ences (SSDt in a 5 x 5 window and a confidence measure

tho ToIf ',,sing tint tional based on the distribution of the SSD values. A smoothnessconstraint similar to that of Glazer is used. The minimi-V. V , zation problem is solved via a finite-elenient method that

as takes into consideration known discontinuities in the dis-

placement field.I ! V u - ds 14.10) Another method, called the multi-constraint method, is

emerging with promise. In this method one considers sev-, hre .J is a v..eighting tat tor. A dit,( rete torn ot the above eral fun(tions f, f,. - , f, such that each of them satisfy

run( tional is spec itier the (constraint equation. In particular,

- + 2 - v + - = 0, i 1 I, ' -• n. (4.14)4).~~~a a, t(. V : V ,,.:

V. Candidate functions include directional derivatives. How-

ever, the results based upon these functions have not beenpromising. Other candidate functions include g = 0(f)

where 0 is an operator like the contrast, entropy, average,4', - , - [V u7 V. n - "'} (4.13) etc. Mitiche, Wang and Aggarwal [88] have reported pre-

liminary success in the computation of optical flow usingwhere k is the number of points in the contour. In order multi-constraint meth ods.to find the velocities tV, V, ) whic h minimize 4), a3DlaV, and Fleet and Jepson [89] and Tsotsos et al. [90] have inves-a(VaV, are equated to zero. This yields 2k linear equations tigated the extraction of motion information using Fourierwhic h are solved via the conjugate gradient algorithm [83]. techniques. They proposed a hierarchical computationalExperimental results using real data have been conducted framework for early processing in the human visual systemwhere the initial perpendicular components of velocity which involves the use of spatiotemporal linear filters tunedwere computed from the time derivative between two LOG to specific frequencies corresponding to specific imagefiltered images, and the gradient along the zero-crossing velocities. A cascaded configuration of orientation specificcontours of the first filtered image. Exneriments on syn- filters followed by speed specific filters was proposed.thetic data show that the smoothness criterion does : at Recently, Heeger [91]demonstrated that a family of motion-guaranteeaccurateestimatesof image flow. It isargued that sensitive Gabor filters can be used to compute optic flow.the velocity field, even though incorrect, is perceptually He used 3-D (space-time) Gabor filters tuned to differentvalid. spatiotemporal-frequency bands and described a method

Nagel [78] has presented a comparative analysis of the for combining the outputs of the filters to compute local

926 PROCE[DINGS OF THF IELE, VOI. 76, NO. 8. AUGUsT 1988

Page 13: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

velocity vectors. He has further suggested a parallel imple- be written asmentation and has illustrated the performance of his v 1 Xapproach with snthetic as well as real data. u = x + (xyf%- (- W + yQZ)

The determination of optical flow for a scene consisting Zof several moin g objects has also been attempted. (4.17a)Research ha, tocused on segmenting the optic flow intoregions corresponding to distinct objects that undergo dif- (V Y , I Q oV Vterent motion. Murray and Buxton [92] use a Bayesian v Z Z / Xy) x[).approach to itornulate the segmentation problem. The optic (4.17b)floN ield is modeled a. spatial and temporal Markov ran-dom tields. The seart h for the globally optimal segmen- The estimation of structure and motion is based on thetation is perfornled using simulated annealing. Thompson key assumptions that I) the optic flow varies smoothly and[9,] combines optical flow and contrast information in a ii) the surface of the object is smooth. Assumption i)allowsregion growing scheme that segments images into regions theoptic flow in asmall image neighborhood around imagecorresponding to surta(es moving with different velocities, location (x, y) to be specified by a Taylor series as:Thompson et,d. [941detect low boundaries using an algo- u(x, y) = u" + ux + uv + U,,xIrithm patterned after the Marr-Hildreth zero-crossingdete(tor. O'Rourke proposed a method to group rotating + u,, x y + u,, y2 + O(x, y) (4.18a)random dot patterns [931. Fennema and Thompson extractmw,.ng regions h, (olle(ting similar optical flow vectors vex, y) = v, + Vx + V, y + v,, x

[96]. Adiv segmented an opti( flow field using a grouping + v" x + V" %2 ± 0 k, y) (4.8b)n'thod based on a Hough %oting approach [97]. Webb and

Ai\gars.,al [48] analszd relati,.e motion between multi- where the partial derivatives (an be computed from theijointed parts () (bier ts. More recently, Tsukune and Aggar- optic flow. Assumption ii) allows a small surface patch Z(X,%%al198] de,, ribea method for extracting multiple rotational Y) around the line of sight to be described as:flo%,. tield,, in the Hough spa(e for orthographically pro- Z = Z0 + ZXX + Z, Y + 1 Z"Xle( ted i-) \ elot it% c t(t)r fields.

B. ( imputn stru(ture and 3-D Flow + Z 2XY + ZI Y2 + O(X, Y) (4.19)

Ha ig c(in)ufte( o)ptl(ial tlo%,., there still remains the for Z0 > 0 is the distance of the surface patch along the lineproblem ft ( i)mputing the nmotion and the structure of the of sight. Substituting the relation (4.15) for Z in (4.19) in aiife fin three-dinensional s.ace. A mathematical for- recursive manner it is possible to further approximate themulatlion W the bai problem is first presented. The for- surface in terms of image plane coordinates as:

Z(x, v) Z° (4.20)(1 - Zxx - Zvy - / Z~x2

- Z.Vxy - 2 Zyy 2 - Ox, y))

mulation is that u,,ed( by Prazdny [99], [100], Longuet-Hig- where Z_ = ZoZxx, Z, = 41417 Z1 = ZoZxv.gins and Pradn, [101, Waxman et al. [102], (103], and Further, the scaled translational velocities are denoted asSubbarao 11041, [105], among others. follows:

A (amera (entered Cartesian coordinate system (X, Y, Z) Vx V- Vz

is used. The Z axis is directed along the viewing direction. V' = - VY = - V' = -, for Zo > 0. (4.21)The image plane is normal to the Z axis and is at unit dis- Z' Z' Z0

tance from the origin. The image coordinate system (x, y) From (4.17), (4.18), (4.20), and (4.21) it is possible to derivehas its origin at (0, 0, 1). rhe x and y axes are parallel to the the following relations [101]-[103], [105] assuming rigid uni-X and Y axes, respectively. In the perspective projection form motion:geometry, the image of a point(X, Y,Z) is formed bydrawing U-_ -V' - = -V+ 1'a line from it to (0, 0, 0) which intersects the image plane Uo - (' Vo +2t (x, y). Therefore u, = -V, + VZx V, = V7 + vy4

x = X/Z and y = Y'Z. (4.15) Ur = flz + V Zv v, = + V Z

The camera is assumed to be in motion, with V - (Vx, vy, u,, = -2 VzZ, + vXz,, - 2Y u, = -VZv + V'Z'yV') being the translational velocity arn . -, fpx. fly, -z)being the rotational velocity. [he insta, .,eous velocity of + ua point R = X. Y, Z) is given by(X, , -(V+ f x R) = VxZrt =

as follows:

--V1 - QYZ + i', VXy= -VZZx + VYZy fl y = -2 VzZ. + V5 Zrr

' - _ + -'Z + 2 Qx. (4.22)

-V,/ - f2 X Y + aYX. (4.16) The system of equations (4.22) relates the optic flow (u,v) and its first- and second-order spatial derivatives to the

From this the instantaneous image velocity (u, v) = (x, ,) can 3-D structure and motion parameters. The geometric struc-

AGGARWAL AND NANDHAKUMAR: COMUTATION OF MOTION OF IMAGES 927

Page 14: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

ture for the smooth surtace is specified locally by the sur- tureiscomputedin unitsof relativedepth, i.e., ratio of depthface slopes and curvatures, i.e., Z , Z , Z_ Z,,, and Z,. The to change in depth. The technique allows for the segmen-three-dimensional motion parameters are the components tation of objects at different depth.of V and Q. The system (4.221 comprises twelve nonlinear Rieger and Lawton [110] have devised a method for deter-equations in eleven unknowns and is thus overdetermined. mining the instantaneous axis of translation for a cameraThe optic tlow and its derivatives are available using any of undergoing general motion. Their method is based on thethe methods outlined in the previous subsection. The over- observation of Longuet-Higgins and Prazdny [101] that twodetermined system (4.22) may. hence, be solved to yield the surface points which lie on the same ray of projection butstructure and motion parameter,. at different depthswill have image velocities that differonly

,Manv interesting observations may be made regarding by the difference in the translational components of theirthe above equations. Note from (4.21) and (4.22) that Z, is 3-D velocity. Difference vectors are computed at optic flownot recoverable and only scaled translational velocity and discontinuities and the intersection of thesedifference vec-curvatures may be computed. Every nonlinear term in (4.22) tors are estimated via an optimization technique similar tois a product ot a structural parameter and a translational that used in [1091. The translational axis is specified by this\elocitv component. Every curvature parameter in (4.22) is procedure and the computation of camera rotation aridmultiplied by a component ot translational velocity (V' or translation is simplified.V') which is parallel to the imagt plane. Hence, if there is Prazdny[1111proposed an approach inwhichthevelocityno translation parallel to the image plane, surface curva- field is decomposed into rotational and translational com-tures cannot be determined. ponents. The rotational motion is hypothesized and the FOE

The nonlinear overdetermined system (4.22) may or may is identified for the resultant translational field. An errornot yield a unique solution. Many situations give rise to function of three parameters is used to evaluate the esti-dependent equations in (4.22) engendering multiple solu- mated motion. Minimization of the error yields the besttions. A detailed analysis of numerous cases has been pre- estimate. The algorithm has been tested on data generated,ented by subbarao [104],[1051 and Waxman eta). [i1021, [ 103] by simulated planar surfaces in motion.%ho have derived closed form solutions for these cases. Bruss and Horn [112] and Horn [71] discuss the formu-Subbarao shows that in general the solution is unique, and lation of an iterative least-mean-squared error approach toat most four soluzions are possible in certain situations, the estimation ot --D motion from optical flow. They makeNegahdaripour [106] also addressed the ambiguity in inter- no a priori assumptions about the motion. They derive apreting optic flow produced by curved surfaces in motion. system of seven equations, three of which are linear in V',He argues that the ambiguity is at most three-fold for the V', and V', and four which are solved via a numericalcase of certain hyperboloids of one sheet viewed by an method. No experimental results, however, have beenobserver moving parallel to the image. The ambiguities shown. Horn and Weldon [113] have proposed methods forinherent in interpreting noisy flow fields are discussed by computing purely translational or purely rotational 3-DAdiv [107]. motion directly from brightness gradients without com-

An overview of some of the approaches for computing putingoptical flow. Theyemployontyfirstderivativesofthestructure and motion parameters from optic flow is given image gray levels, and analyticity of the surface is notbelow. The approaches typically involve restricting the required. Negahdaripour and Horn (114] discuss the recov-nature of motion to be purely translatory or rotational and/ ery of motion of a camera relative to a planar surface. Theyor restricting the imaged surface to be planar. These also do not compute optic flow, and use instead the spatialassumptions significantly reduce the complexity of the sys- and temporal derivatives of brightness values directly. Theytem of equations (4.22). presen* iterative schemes for solving nine non-linear equa-

Williams [108] considered the computation of the struc- tions based on a least-squares formulation, and also pre-ture of imaged scene components for the situation where sent a closed form solution.the sensor was involved in purely translatory motion. The Chou and Kanatani [115] use a scheme in which objectFocus of Expansion (FOE) of image flow is assumed known motion is initially hypothesized and iteratively refined. Theyand the scene is considered to consist of planar surfaces. extract features from the images obtained before and afterA height and position is hypothesized tor each segmented motion. They do not require that feature correspondenceregion. An image is generated for the known camera motion be established a priori. [hey transform the first set of fea-and compared with the actual image. Error in the hypoth- tures and evaluate the discrepancy between the estimatedesized structure is computed from the difference between feature positions and the true feature positions (in the sec-these two images and appropriate corrections are made to ond image) aftcr motion. Assuming infinitesimal motion,the hypothesized scene structure. This procedure is they relate the discrepancy to optic flow parameters. Theyrepeated until the error falls belo v a threshold. This use a numerical least-squares technique to solve the linearapproach has also been suggested for detecting the FOE. constraints for a better estimate of the motion. This process

An approach for determining scene structure from a is repeated until the estimated motion produces featuresequence of images acquired by a translating camera is positionsthataresufficientlyclosetothetrueonesobtainedcredited to Lawton [109]. In this method, features are in the second image after motion.extracted from each image. Several directions of camera In this section we have presented the optic flow approachmotion are hypothesized. Each corresponds to a unique for the estimation of motion parameters from a sequenceFOE or FOC. Image featuredisplacementsarecomputed for of monocular images. We discussed the basic formulationeach motion and compared with actual displacements. The of the problem and outlined some of the recently devel-motion corresponding to minimum error in feature dis- oped techniques for computing the optic flow. The aboveplacements is chosen to be the best estimate. Scene struc- discussion included the problem of inferring 3-D structure

928 PROCEEtDINGS OF THE IM[t, VOL. 76, NO. 8, AUGUST 1988

Page 15: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

and motion from optic flow and overviews ot some of the Sensitivity to noise is also a problem with the featuresolutions to this problem. based techniques though to a lesser degree. The tech-

niques reported in the literature have all been only mar-ginallytolerant tonoise. One methodof decreasing the sen-sitivity to noise has been to use more than the required

In the preceeding se(tions ,,e dis(cu,,,ed two distinct minimum number of features in an iterative least-squaresapproaches tor the estimation of motion trom monocular technique. Although this usually has a smoothing effect, itimage sequent es i.e., feature-based analvsis and optical can cause additional complications. For example, if all theflow, methods. In this section we compare the two additional points chosen are coplanar, then all that has beenapproaches and discuss some ot the advantages and dis- achieved is a significant increase in the computation timeadvantages associated with each of the methods, and probable instability of the solution. The establishment

Feature-based approaches require that correspondence of correspondence also becomes computationally expen-be established bet,,'een a sparse set of features extracted sive.from one image with those extracted from the next image Recently, Verri and Poggio [1181 argued that theoptic flowin the sequence. Altho)ugh several methods have been dis- does not correspond to the 2-D velocity field unless very

cussed tor extracting and establishing feature correspon- special conditions are satisfied. They argue against the useden( e, the task is difficult and only partial solutions suitable of optic flow for quantitative estimates of 3-D motion. Theyfor simplistic situations have been developed. In general, apply the theory of stability of dynamical systems to thethe process is complicated by occlusion which may cause optic flow formulation and (onclude that theoptic flow mayfeatures to be hidden, false features to be generated and provide stable qualitative information such as the Focus ofhidden teatures to reappear. Much more work needs to be Expansion and motion discontinuities.(lone in this area before the advent of one or more general When numerical techniques are used tor the solution oftechniques that (-an be reliably applied to real imagery. In structure and motion using either approach one must con-comparison, the optic flow approach, in general, does not sider the many caveats involved in such a solution. A dis-require any weature correspondence to be established. cussion of these caveats would be inappropriate in this

The computation of the optical flow as well as the inter- paper and the reader is directed to the literature in numer-pretation of motion and structure from optic flow requires ical analysis for possible pitfalls and remedies.the evaluation of first and second partial derivatives of Much attention has been devoted recently by the com-image brightness values and also of the optic flow. Real puter vision community to the use of regularization tech-images are, in general, noisy. The evaluation of derivatives niques in many vision tasks including both feature-basedis a noise enhancing operation. The higher the order, the formulations and the optic flow approach for motion andmore noise sensitive is the derivative. Hence, even in cases structure estimation [119]-[123]. This technique is used towhere closed form solutions for the 3-D structure and reformulate certain ill-posed problems into well-posedmotion exist, the optical flow techniques do not produce problems. The ill-posed problems are those forwhich eitherusable results because of the sensitivity to noise [71]. Also, 1) the solution exists but is not unique, or 2) the solutionthere are discontinuities in theoptical flowdepending upon does not depend continuously on the input data. Regular-occlusion, and these regions must be detected reliably oth- ization is typically formulated as an error minimization anderwise violations of the continuity assumption will have involves a stabilizing functional that is applied to the inputadverse and global effects on the estimate of optical flow. data and perhaps an additional smoothing parameter. Due

In contrast to the method of global minimization, another to the seemingly infinite choiceof possible stabilizing func-approach depends upon solving a set of constraints in a tions and smoothness parameters it is difficult to specify asmall neighborhood. However, the local and global meth- best regularizing algorithm for an application.ods rely on similar assumptions of smoothness of opticalflow field. The common weakness of both methods is the VI. COMPUTING MOTION FROM A SEQUENCE OF STEREO

inaccurate estimates at points where the flow changes IMAGFSsharply or is discontinuous. The global method propagatesthe errors across the eiiht c image, w-hile the neighborhood The technique described in the previous sections deter-size limits the propagation in local methods. Schunck [701 minethemotionand structureofanobjectgiven a sequenceand Kearney et al. [116], [117] address these difficulties in of monocular images of the scene. It was seen that in bothdetail. Kearney et al. present a detailed analysis of the the feature-based methods as well as in the optic flow tech-sources of errors in local optimization techniques for com- niques, the solutions for structure and motion remainputing optical flow [116]. They identify three main sources ambiguous with respect to absolute value of distanceof error: between the camera and the scene. In other words, struc-

ture and motion parameters are unique only up to a scaling1) Poor estimation of brightness gradients in highly tex- factor. The use of stereoscopy can provide this additional

tured image regions.The problem is especiallysevere parameter to uniquely determine depth and hence abso-for temporal gradients in moving regions. lute values for the structure and motion parameters.

2) Variations in optic flow across the image violate the The fusion of stereo and motion may be effected withassumption of locally constant flow. Significant error different objectives in mind. Stereoscopic processing mayarises t discontinuities in the flow field. be used to aid motion recovery, or con ',rsely, motion anal-

3) Insufficient local variation in the orientation of the ysis may be used to help establish feature correspondencebrightness gradient which causes error propagation in stereo image pairs. The fusion of these two processingin the ill-conditioned system. modules in human and other biological visual systems has

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION OF IMAGES 929

Page 16: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

been detected via neurobiological and psychophysiologi- parameters that yields the lowest error is chosen. Althoughcal investigation [124], [125]. Recent research in both the the solution of the system ot linear equations is easy, thefeature-based and optic flow based approaches has estimation of large sets of motion parameters and espe-addrL.,t,, t ie tusion or ,tereosccpic analysis and motion cially the search for the best set of motion parameters isestimation. We outline the salient features of such effort. computationally intensive.

An alternative approach has been to use a least-mean-squared error analysis [127], [129]. The underlying principle

A. Feature Based Anah'sis here is again the invariance of distance between points onThe overall analysis consists of the following steps: i) From an object subjected to rigid motion. The formulation is anal-

the sequences of stereo images, the depth map tor each ogous to the approach followed by Magee and Aggarwalstereo pair is determined, ii) the correspondence between [1301, [131] for determining motion parameters fromthree-dimensional features in successive depth maps is sequences of range images. While the direct method ofestablished, and iii) the motion of the objects is computed solution is adopted in [130], [131], a two-part iterativebased upon the matched features. This formulation of approach is adopted in [127]. The displacement betweenmotion analysis based on sequences of stereo images has the centroids of two sets of registered 3-D points is usedseveral advantages and disadvantages which are briefly dis- to determine the translation vector. The rotation matrix iscussed below, decomposed into three factors corresponding to rotations

Kim and Aggarwal discuss the estimation of motion aboutthez, x, andyaxes. Eachoftheseisindividuallysolvedparameters trom a sequence of depth maps extracted from for while the other two are fixed. This is repeated in a cyclicstereo images [63]. The depth map for each stereo pair is manner until a least mean squared error criterion is sat-computed using an edge-based stereo algorithm. 3-D fea- isfied. The advantage of the above decomposition is thattures (consisting of lines and points) are extracted from each the 3-D estimation problem reduces to a set of 2-D problemsdepth map. These features are matched between succes- which are more tractable.sive depth maps using a two pass relaxation process [61], The above approaches consider the determination of[62]. In the process of extraction, search and matching, the structure and motioii as separate issues. Hence, if structuresearch space is limited to the areaofthe motion in the image is first computed (as is usually the case for stereo imagery)by an image differencing technique. then errors accrued due to quantization of disparity will

In general, correspondences between two 3-D lines continue to plague the estimation of motion. To alleviateextracted from one depth map and those from another may this problem a new approach has been developed by Kiang,be used to determine the motionofa rigid object, assuming Chou and Aggarwal [132] based on iterative refinement ofthat the motion is small. Here, a three-dimensional line is both structure and motion estimates. The approach is basedspecified by a three-dimensional direction and a point on on a 1-D model for triangulation error in stereoscopy. Thethe line. The same method can be used for three-dimen- strategy for modifying structure and motion estimates issional point correspondences since two points determine based on the structural relationship between the corre-a line. In general, three point correspondences, or one line sponding uncertainty polyhedra in successive depth maps.correspondence and one point correspondence are suffi- Experimental results using synthetic as well as real datacient to determine the three-dimensional motion pararn- demonstrate significant improvement in the estimation ofetersof a movingobject. In the formercase, the three points both structure and motion when cempared to the conven-should not be collinear, and in the latter case, the point tional techniques based on reducing least-mean-squaredshould not lie on the same line. A system of linear equations error in motion alone.is derived and the solution is straightforward. A system Aloimonos and Rigoutsos [133] have developed a schemebased upon these observations has been implemented to for computing 3-D motion parameters from a sequence ofderive the structure and the displacement of the objects stereo imagery which does not require a priori establish-between the views. In this study the motion of simple toy ment of correspondences. The features extracted from theobjects was estimated with excellent results [63]. left and right images are assumed to lie on a planar surface

Although it is theoretically quite easy to estimate the Z = pX + qY + c. Perspective imaging geometry ismotion parameters given the correspondence between two assumed. The image planes are parallel to the X - Y plane.sets of 3-D points, practical considerations complicate the The parameters p, q, and c are acquired by solving a set ofimplementation of the system. In stereo imagery, the range linear equations in which the coefficient of each of thevalues estimated are subject to a great deal of uncertainty unknowns consistsof a function of a sum of the imagecoor-due primarily to quantization of disparity. More robust for- dinates. The solution of the linear equation provides themulations of the problem of motion estimation using structure of the scene. Applying this process before andsequences of stereo images have been proposed [126-[128]. after the planar surface undergoes motion allows for theOne approach has been to estimate motion parameters via estimation of the motion parameters. The method devel-a system of linear equations using 3 points in each depth oped was not as robust as was expected and was modifiedmap [126]. Several sets of 3 points are chosen from the large by including a third camera. The performance of the algo-number of available points and the motion parameters are rithm in presence of noise is described in [133].computed for each set. For each set of computed motion Another technique forestimating 3-D motion parametersparameters, all available points in the first depth map are from two 3-D point sets without establishing correspon-subjected to the estimated motion. The discrepancy dence has been presented by Lin et al. [134]. The algorithmbetween the points in the second depth map and trans- is based on the property that a function and its Fourierformed points from the first depth map is computed via a transform must experience the same rotation. The trans-simple distance measure. The set of estimated motion lation is first determined from the displacement of the cen-

930 PROCEEDINGS OF THE IEEE, VOL. 76, NO. 8, AUGUST 1988

930

Page 17: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

troid. Two functions are defined on the feature set. A cor- its anu demerits of both approaches were discussed. Anrelation between the Foui'er transforms of these functions overview of the fusion of stereoscopy and motion analysis, -

is determined. The rotation axis and angle are computed especially for aiding the estimation of motion, was pre-based on this procedure. Some simulation results have been sented.presented [134]. The optic flow approach consists of computing the two-

the above techniiques are representative of the dimensional field of instantaneous velocities of brightnessapproaches wherein stereopsis aids the recovery of motion, values (gray levels) in the image plane. Instead of consid-There exist many reports in recent literature discussing the ering temporal changec it, image brightness values in com-use of motion in recovering structure, e.g., Jenkin [135] used puting the optic flow field, it is possible to also considerinstantaneous velocities at feature points to aid the estab- temporal changes in values that are the result of applyinglishment of stereo correspondence, Nevatia [136], Mutch various focal operators such as contrast, entropy, and spa-[137], Xu et at. (1381, and lain [1391, among others, used tial derivatives to the image brightness values. Ineithercase,known motion parameters to simulate stereo. We feel that a relatively dense flow field is estimated, usually at everyalthough this approach is related to the estimation of pixel in the image. The optic flow is then used in conjunc-motion, it is a separate field in itself. Hence, we do not pur- tion with added constraints or information regarding thesue any further the discussion of the use of known motion scene to compute the actual three-dimensional relativeto aid stereopsis, and we limit our discussion to the use of velocities between scene objects and camera.stereoscopy for estimation of motion. The feature-based approach is based on extracting a set

of relatively sparse but highly discriminatory set of two-B. Multiple Optic Flow Fields dimensional features in the images corresponding to three-

dimensional object features in the scene such as corners,In Section IV we discussed the interpretation of optic flow occluding boundar'es of surfaces, and boundaries demar-

fields obtained from a sequence of monocular images. cating changes in surface reflectivity. Such points, lines and/Another approach has been to compute multiple optic f:-)w or curves are extracted from each image. Inter-frame cor-fields from different views, to establish correspondence respondence is then established between these features.between them and reconstruct 3-D velocity vector fields. Constraints are formulated based on assumptions such as

Mitiche (140] assumes that optic flow is computed for each rigid body motion, e.g., the 3-D distance between two fea-view in a stereoscopic imaging system for which the ste- tures on a rigid body remains the same after object/camerareoscopy parameters are known. He further assumes that motion. Such constraints usually result in a system of non-correspondence between points in the two images are linear equations. The observed displacement of the 2-Davailable which allows for the estimation of depth. Mitiche image features are used to solve these equations leadingshows that given this information it is posssible to compute ultimately to the computation of motion parameters ofthe 3-D motion parameters in a straightforward manner. objects in the scene.Waxman and Sinha [141] have used a similar approach. In In the feature-based approach, the main problemsaddition, they have filtered the optic flow field to minimize encountered are seen to be: 1) establishing and maintainingthe effects of noise. Nagel [142] has also attempted such correspondence between the image plane features, 2)stereo-motion fusion techniques and has devised an robust formulation of the problem which is usually basedapproach based on the minimization of an error function. on the assumption that the viewed object undergoes rigidTsukune and Aggarwal [981 have used this approach for motion, and 3) developing appropriate iterative algorithmsreconstructing 3-D velocity fields for a scene containing which are stable and accurate. The optic flow basedmultiple objects in motion. approach suffers from a different set of drawbacks, i.e., 1)

Richards [143] demonstrated that the relative rate of it is highly noise sensitive due to its dependence on spatio-change of disparity (ratio between temporal rate of change temporal gradients, 2) it requires that motion be smoothof disparity and disparity) due to object/camera motion is and small thus requiring a high rate of image acquisition,a useful aid in establishing feature correspondence within and 3) it requires that motion vary continuously over thea pair of stero images. Waxman and Duncan [144] used the image. Both approaches also are affected by object occlu-ratio between relative flow and disparity to aid the estab- sion and choice of initial/boundary conditions. The use oflishment of stereo correspondence. The relative flow is sequences of stereoscopic images provides three-dimen-defined to be the differ -rce between the optic flow at a sional points and lines which somewhat simplify the prob-point in the left image and that at the corresponding point lem of estimating motion.in the right image. Waxman and Duncan show that their Agreat deal of future research effort is warranted toover-ratio is identical to the one devised by Richards [143]. come the obstacles mentioned above. The significant con-

tributions made by various researchers in this area duringVII. CONCLUSION the recent past is to be noted and this trend may be expected

to continue in the future. Two workshops, one in EuropeIn this paper we have reviewed recently developed tech- [145] and one in the USA [146] are planned in the near future

niquesfor estimating structureand motion from sequences to engender progress in this challenging area.of monocular and stereoscopic images. We discussed twodistinct approaches: feature-based analysis and optic flowtechniques. We described some of the different mathe-

matical formulations that have been developed for each of The authors wish to thank the reviewers, the editor Dr.these tasks. A comparison of the feature-based and optic H. Li, and Dr. W. Martin for their comments, which helpedflow methods was then presented in which the relative mer- in improving this manuscript.

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION OF IMAGES 931

Page 18: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

REFERENCES Physiology, R. Held, H. W. Leibowitz, and H.-L. Teuber, Eds.Berlin, West Germany: Springer-Verlag, 1978.

[1] 1.K. Aggarwal and A. I. Badler, eds., Abstracts for the Work- [27] G. Jansson and G. Johansson, "Visual perception of bendingshop on Computer Analysis of Time-Varying Imagery, Uni- motion," Perception, vol. 2, pp. 321-326, 1973.versity of Pennsylvania, Moore School of Electrical Engi- [28] G. Johansson, "Visual motion perc'otion," Sci. Amer., vol.neering, Philadelphia, PA, Apr. 1979. 232, pp. 76-88, 1975.

[2] -, "Special is~aie on motion and time-varying imagery," [29] D. H. Hubel and T. N. Wiesel, "Receptive fields and func-IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-2, no. 6, tional architecture in two non-striate visual areas (18 and 19)Nov. 1980. of the cat," J. Neurophysiol., vol. 28, pp. 229-289, 1965.

[3] W. E. Snyder, (Guest Ed.), "Special issue on computer anal- [30] J. W. Roach and J. K. Aggarwal, "Computer tracking ofysis of tirne-varying images," Computer, vol. 14, no. 8, Aug. objects moving in space," IEEE Trans. Pattern Anal. Machine1981. Intell., vol. PAMI-1, no. 2, pp. 127-135, Apr. 1979.

[4] J.K.Aggarwal,(Guest Ed.),"Specialissueon motion andtime [31] -, "Determining the movement of objects from avarying imagery," Computer Vision Graphics and Image sequence of images," IEEE Trans. Pattern Anal. MachineProcessing, vol. 21, nos. 1 and 2, Jan., Feb. 1983. Intell., vol. PAMI-2, no. 6, pp. 554-562, Nov. 1980.

[5] T. S. Huang, Ed., Image Sequence Analysis. New York, NY: [32] H. H. Nagel, "Representation of moving rigid objects basedSpringer-Verlag, 1981. on visual observations," Computer, pp. 29-39, Aug. 1981.

[6] S. UlIman, The lnterpretation of Visual Motion. Cambridge, [33] R. Y. Tsai and T. S. Huang, "Estimating 3-D motion param-MA: MIT Press, 1979. eters of a rigid planar patch, I," IEEE Trans. Acoust. Speech

[7] NATO Advanced Study Institute on Image Sequence Pro- Signal Processing, vol. ASSP-29, no. 6, pp. 1147-1152, Dec.cessing and Dynamic Scene Analysis, Advance Abstracts of 1981.Invited and Contributory Papers, Braunlage, West Germany, [34] R. Y. Tsai, T. S. Huang, and W. L. Zhu, "Estimating three-June 21-July 2, 1982. dimensional motion parameters of a rigid planar patch, II:

[8] Siggraph/Siggart Interdisciplinary Workshop on Motion: Singular value decomposition," IEEE Trans. Acoust. SpeechRepresentation and Perception, Toronto, Canada, April 4- Signal Processing, vol. ASSP-30, pp. 525-534, Aug. 1982.6, 1983, and Computer Graphics, vol. 18, no. 1, Jan. 1984. [35] T.S. Huang and R. Y. Tsai, "Image sequence analysis: Motion

[9] International Workshop on Time-Varying Image Processing Estimation," in Image Sequence Processing and Dynamicand Moving Object Recognition, Florence, Italy, May 1982. Scene Analysis, T. S. Huang, Ed. New York, NY: Springer-

[10] W. N. Martin and J. K. Aggarwal, "Dynamic scene analysis: Verlag, 1981.A survey," Computer Graphics and Image Processing, vol. [36] R. Y. Tsai and T. S. Huang, "Uniqueness and estimation of7, pp. 356-374, 1978. three-dimensional motion parameters of rigid objects with

[11] H.-H. Nagel, "Analysis techniques for image sequences," in curved surface," IEEE Trans. Pattern Anal. Machine Intell.,Proc. IJCPR-78, (Kyoto, Japan), pp. 186-211, Nov. 1978. vol. PAMI-6, no. 1, pp. 13-26, Jan. 1984.

[12] J. K. Aggarwal and W. N. Martin, "Dynamic scene analysis," [37] H. C. Longuet-Higgins, "A computer algorithm for recon-in Image Sequence Processing and Dynamic Scene Analy- structing a scene from two projectinnr," Ntiire, vol. 293, ppsis, T. S. Huang, Ed. New York, NY: Springei-Verlag, 1983, 133-135, Sept. 1981.pp. 40-74. [38] -, "The reconstruction of a scene from two projections-

[13] J. K. Aggarwal, "Three-dimensional description of objects configurations that defeat the 8-point algorithm," in Pro-and dynamic scene analysis," in Digital Image Analysis, S. ceedings of the First Conf. on Artificial Intelligence Appli-Levialdi, Ed. New York, NY: Pitman Books, Ltd., 1984, pp. cation, (Denver, CO), pp. 395-397, Dec. 5-7, 1984.29-46. [39] X. Zhuang, T. S. Huang, and R. M. Haralick, "Two-view

[14] H.-H. Nagel, "What can we learn from applications?," in motion analysis: A unified algorithm," 1. Opt. Soc. Amer.,Image Sequence Analysis, T. S. Huang, Ed. New York, NY: vol. 3, no. 9, pp. 1492-1500, Sept. 1986.Springer-Verlag, 198-, pp. 19-228. [40] X. Zhuang and R. M. Haralick, "Two view motion analysis-

[15] -, "Overview on image sequence analysis," in Image Theory and algorithm," in Proc. ICASSP, Mar. 1985.Sequence Processing and Dynamic Scene Analysis, T. S. [41] -, "Two view motion analysis," in Proc. IEEE Conf. onHuang, Ed. New York, NY: Springer-Verlag, 1983, pp. 2-39. Computer Vision and Pattern Recognition, pp. 686-690, June

[16] T. S. Huang, Ed., Image Sequence Processing and Dynamic 1985.Scene Analysis. Berlin, West Germany: Springer-Verlag, [42] 0. D. Faugeras, F. Lustman, and G. Toscani, "Motion andProceedings of NATO Advanced Study Institute at Braun- structure from motion from point and line matches," in Proc.lage, West Germany, 1983. 1st Int. Conf. Computer Vision, (London, England), pp. 25-

[17] IEEE Computer Society Workshop on Motion: Representa- 34, June 1987.tion and Analysis, Kiawah Island, SC, May 1986. [43] H.-H. Hagel, "Image sequences-Ten (octal) years-From

[18] The 2nd International Workshop on Time-Varying Image phenomenologytowards atheoretical foundation," in Proc.Processing and Moving Object Recognition, Florence, Italy, Int. Conf. of Pattern Recognition, pp. 1174-1185, Oct. 1986.September 1986. [44] B. L. Yen and T. S. Huang, "Determining 3-D motion and

[19] R. Chellappa and A. A. Sawchuk, Eds., Digital Image Pro- structure of a rigid body using straight line correspon-cessing and Analysis: Volume 2: Digital Image Analysis. dences," in Image Sequence Processing and Dynamic SceneNew York, NY: IEEE Computer Society Press, 1985. Analysis, T. S. Huang, Ed. New York, NY: Springer Verlag,

[20] W. Martin and J. K. Aggarwal, Eds., Motion Understanding: 1983.Robot and Human Vision. Norwell, MA: Kluwer Academic [45] -,"Determining 3-D motion/structureof a rigid bodyoverPublishers, 1988. 3 frames using straight line correspondences," in Proceed-

[21] H. Wallach and D. N. O'Connell, "The kinetic depth effect," ings of the IEEE Computer Society Conf. on Computer VisionI. Exp. Psychol., vol. 45, pp. 205-217, 1953. and Pattern Recognition, (Washington, DC), pp. 267-272,

[22] E. J. Gibson, J. J. Gibson, 0. W. Smith, and H. Flock, "Motion June 19-23, 1983.parallax as a determinant of perceived depth," J. Exp. Psy- (46] Y. C. Liu and T. S. Huang, "Estimation of rigid body motionchol., vol. 8, pp. 40-51, 1959. using straight line correspondences," in Proc. IEEE Com-

[23] S. UlIman, "The interpretation of structure from motion," puter Society Workshop on Motion: Representation andin Proc. R. Soc. London, 8203, pp. 405-426, 1979. Analysis, pp. 47-52, May 1986.

(241 G. Johansson, "Visual perception of biological motion and [47] -, "Estimation of rigid body motion using straight line cnr-a model for its analysis," Perception and Psychophysics, vol. respondences: Further results," in Proc. Int. Conf. of Pattern14, pp. 201-211, 1973. Recognition, pp. 306-309, Oct. 1986.

[25] -, "Spatio-temporal differentiation and integration in (48] J.A. Webb and J. K. Aggarwal,"Structureand motion of rigidvisual motion perception," Psych. Res., vol. 38, pp. 379-383, and jointed objects," Artificial Intelligence, No. 19, pp. 107-1976. 130, 1982.

[26] -, "Visual event perception," in Handbook of Sensory [49] A. Mitiche, S. Seida, and 1. K. Aggarwal, "Determining posi-

932 PROCEEDINGS OF THE IEEE, VOL. 76, NO. 8, AUGUST 1908

Page 19: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

tion and displacement in spa e from images," in Proceed- [73] B.G. Schunck,"lmageflow:Fundamentalsandalgorithms,"ings of IEEE Compuer Sociey Conf. on Computer Vision in Motion Understanding: Robot and Human Vision, W. N.and Pattern Recognition, (Sar. Francisco, CA), pp. 504-509, Ma'tin and J. K. Aggarwal, Eds. Norwell, MA: Kluwer Aca-June 19 23, 1985. demic Publishers, 1988.

[50] -, "Line-based computation of structureand motion using [74] H.-H. Nagel, "Displacement vectors derived from second-angular invariance," in Proi IEEE Computer Society Work- order intensity variations in image sequences," Computershop on Motion, pp. 173-180, May 1986. Vision, Graphics and Image Processing, vol. 21, pp. 85-117,

[51] J. K. Aggarwal and Y. F. Wang, "Analysis of a sequence of 1983.images using point and line correspondences," in Proceed- [751 W. E. Snyder, S. A. Rajala, and G. Hirzinger, "Image mod-ings of the 1987 IEEE International Conf. on Robotics and eling, the continuity assumption and tracking," in Proc. nt.Automation, (Raleigh, NC), L)p. 1275-1280, Mar. 31-Apr. 3, Conf. of Pattern Recognition, pp. 1111-1114, 1980.1987. [761 K. Prazdny, "A simple method for recovering relative depth

[52] S. UlIman, "Maximizing rigidity: The incremental recovery map in the case of a translating sensor," in Proc. of Int. Jointof 3D structure trom rigid and non-rigid motior " Percep- Conf. on Artificial Intelligence, pp. 698-699, 1981.tion, vol. 13, pp. 255-274, 1984. [77] M. Yachida,"Determiningvelocitymapby3-Diterativeesti-

[53] E. C. Hildreth and N. M. Grzywacz, "The incremental recov- mator," in Proc. of Int. Joint Conf. on Artificial Intelligence,erv of structure from motion: Position vs. velocity based for- pp. 716-718, 1981.mulations," in Proc. IEEE Computer Society, Workshop on [78] H.-H. Nagel, "On the estimation of optical flow: RelationsMotion, pp. 137-143, Ma' 1986. between different approaches and some new results," Arti-

[54] T. J. Broida and R. Chellappa, "Estimation of object motion ficial Intelligence, vol. 33, pp. 299-324, 1987.parameters from noisy images," IEEE Trans. Pattern Anal. [79] H.-H. Nagel and W. Enkelmann, "Investigation of second-Machine Intell., vol. PAMI-8, no. 1, pp. 90-99, Jan. 1986. order gray value variations to estimate corner point dis-

[55] -, "Kinematics and structure of a rigid object from a placements," in Proc. Int. Conf. of Pattern Recognition,sequence of noisy images," in Proc. IEEE Computer Society (Munich, W. Germany), pp. 768-773, 1982.Workshop on Motion, pp. 95-100, May 1986. [80] H.-H. Nagel, "Constraints for the estimation of displace-

[56] I. Weng, T. S. Huang, and N. Ahuja, "3-D motion estimation, ment vector fields from image sequences," in Proc. of Int.understanding and prediction from noisy image Joint Conf. on Artificial Intelligence, pp. 945-951, 1983.sequences," IEEE Trans. Pattern Anal. Machine Intell., vol. [81] R. M. Haralick and J. S. Lee, "The facet approach to opticPAMI-9, no. 3, pp. 370-389, May 1987. flow," in Proceedings of Image Understanding Workshop,

[57] 1. K. Aggarwal, L. S. Davis, and W. N. Martin, "Correspon- (Arlington, VA), pp. 84-93, 1983.dence processes iii dynamic scene analysis," Proc. IEEE, vol. [82] 0. Tretiak and L. Pastor, "Velocity estimation from image69, no. 6, pp. 562-572, May 1981. sequences with second order differential operators," Proc.

[58] S. K. Sethi and R. Jain, "Finding trajectories of feature points Int. Conf. of Pattern Recognition, pp. 16-19, 1984.in a monocular image sequence," IEEE Trans. Pattern Anal. [83] E.C. Hildreth,"Computations underlyingthe measurementMachine intelf., voi. PAMI-9, no. 1, pp. 56-73, ar,. 1987. of visual motion," Artificial Intelligence, vol. 23, pp. 309-354,

[59] J.-Q. Fang and T. S. Huang, "Some experiments on esti- 1984.mating the 3-D motion parameters of a rigid body from two [84] W. Enkelmann, "Investigations of multigrid algorithms fromconsecutive image frames," IEEE Trans. Pattern Anal. the estimation of optical flow fields in image sequences,"Machine Intell., vol. PAMI-6, pp. 545-554, Sept. 1984. in Proc. IEEE Computer Society Workshop on Motion: Rep-

[60] S. Ranade and A. Rosenfeld, "Point pattern matching by resentation and Analysis, pp. 81-87, May 1986.relaxation," Pattern Recognition, vol. 12, pp. 269-275, 1980. [85] F. Glazer, "Hierarchical motion detection," Ph.D. Disser-

[611 Y. C. Kim and J. K. Aggarwal, "Finding range from stereo tation, COINS Department, University of Massachusetts,images," in Proceeding IEEE Conf. Computer Vision and Pat- Amherst, MA, Feb. 1987.tern Recognition, (San Francisco, CA), pp. 289-294, June [86] P. Anandan, "A unified perspective on computational tech-1985. niques for the measurement of visual motion," in Proc. 1st

[62] Y. C. Kim, "Structure and motion of objects from stereo Int. Conf. Computer Vision, (London, England), pp. 219-230,images," Ph.D. Dissertation, Department of Electrical and June 1987.Computer Engineering, The University of Texas at Austin, [87] -, "Computing dense displacement fields with confi-May 1986. dence measures in scenes containing occlusion," in Proc.

[63] -, "Determining object motion in a sequence of stereo DARPA Image Understanding Workshop, (New Orleans, LA),images," IEEE]. Robotics Automat., vol. RA-3, no. 6, pp. 599- pp. 236-246, 1984.614, Dec. 1987. [88] A. Mitiche, Y. F. Wang, and J. K. Aggarwal, "Experiments in

[64] S. T. Barnard and W. B. Thompson, "Disparity analysis of computing optical flow with the gradient-based, multicon-images," IEEE Trans. Pattern Anal. Machine IntelL, vol. PAMI- straint method," Pattern Recognition, vol. 20, no. 2, pp. 173-2, pp. 333-340, July 1980. 179, 1987.

[65] .M. Pragerand M. A. Arbib, "Computing the optic flow: The [89] D. J. Fleet and A. D. Jepson, "Velocity extraction withoutMATCH algorithm and prediction," Computer Vision, form interpretation," in Proceedings of the Third WorkshopGraphics and Image Processing, vol. 24, pp. 271-304, 1983. on Computer Vision: Representation and Control, (Bellaire,

[66] M. Jenkin and J. K. Tsotsos, "Applying temporal constraints MI), pp. 179-185, Oct. 1985.to the dynamic stereo problem," Computer Vision, Graphics [90] J. K. Tsotsos, D. J. Fleet, and A. D. Jepson, "Towards a theoryand Image Processing, vol. 33, pp. 16-32, 1986. of motion understanding in man and machine," in Motion

[67] D. Marr, Vision. New York, NY: Freeman, 1982. Understanding: Robot and Human Vision, W. N. Martin and[68] L. Dreschler and H.-H. Nagel, "Volumetric model and 3-D J. K. Aggarwal, Eds. Norwell, MA: Kluwer Academic Pub-

trajectory of a moving car derived from Monocular TV frame lishers, 1988.sequence of a street scene," Computer Graphics and Image [91] D. J. Heeger, "Optical flow from spatiotemporal filters," inProcessing, vol. 20, pp. 199-228, 1982. Proc. 1st Int. Conf. Computer Vision, (London, England), pp.

[69] B. K. P. Horn and B. G. Schunck, "Determining optical flow," 181-190, June 1987.Artificial Intelligence, vol. 17, pp. 185-203, 1981. [92] D. W. Murray and B. F. Buxton, "Scene segmentation from

[70] B. G. Schunck, "Image flow: Fundamentals and future visual motion usingglobaloptimization,"IEEETrans. Patternresearch," in Proc. of IEEE Conf. on Pattern Recognition and Anal. Machine Intell., vol. PAMI-9, no. 2, pp. 220-228, Mar.Image Processing, pp. 560-571, 1985. 1987.

[711 B. K. P. Horn, Robot Vision. Cambridge, MA: MIT Press, [93] W. B. Thompson, "Combining motion and contrast for seg-1986. mentation," IEEE Trans. Pattern Anal. Machine Intell., vol.

[721 D. H. Ballard and 0. A. Kimball, "Rigid body motion from PAMI-2, pp. 543-549, 1980.depth and optical flow," Computer Vision, Graphics and [94] W. B. Thompson, K. M. Mutch, and V. A. Berzins, "DynamicImage Processing, vol. 22, pp. 95-115, 1983. occlusion analysis in optical flow fields," IEEE Trans. Pattern

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION OF IMAGES 933

Page 20: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

Anal. Machine Intell., vol. PAMI-7, no. 4, pp. 374-383, July [116] 1.K. Kearney, W. B. Thompson, and D. L. Boley, "Optical flow1985. estimation: An error analysis of gradient-based methods with

[95] ). O'Rourke, "Motion detection using Hough techniques," local optimization," IEEE Trans. Pattern Anal. Machine Intell.,in Pattern Recognition and Image Processing Conference, vol. PAMI-9, no. 2, pp. 229-244, Mar. 1987.(Dallas, TX), pp. 82-87, Aug. 3-4, 1981. [117] J. K. Kearney, "Gradient-based estimation of optical flow,"

[96) C. L. Fennema and W. B. Thompson, "Velocity determi- Ph.D. Dissertation, University of Minnesota, 1983.nation in scenes containing several moving objects," Com- [118] A. Verri and T. Poggio, "Against quantitative optical flow,"puter Graphics and Image Processing, vol. 9, pp. 301-305, in Proc. 1st Int. Conf. Computer Vision, (London, England),Apr. 1979. pp. 171-180, June 1987.

[97] G. Adiv, "Determining three-dimensional motion and struc- [119] Y. Yasumoto and G. Medioni, "Robust estimation of three-ture from optical flow generated by several moving objects," dimensional motion parameters from a sequence of imageIEEE Trans. Pattern Anal. Machine Intell., vol. 7, no. 4), pp. frames using regularization," IEEE Trans. Pattern Anal.384-401, 1985. Machine Intell., vol. PAMI-8, no. 4, pp. 464-471, July 1986.

[98] H. Tsukune and 1. K. Aggarwal, "Analyzing orthographic [120] T. E. Boult, "What is regular in regularization?," in Proc. 1stprojection of multiple 3-D velocity vector fields in optical Int. Conf. Computer Vision, (London, England), pp. 457-462,flow," Computer i, ;on, Graphics and Image Processing, in June 1987.press. [1211 T. Poggio, V. Torre, and C. Koch, "Computational vision and

[99[ K. Prazdns, "'Motion and structure from optical flow," in regularization theory," Nature, vol. 317, no. 6035, pp. 314-Proc. of Int. Joint Con. on Artificial Intelligence, pp. 702- 319, 1985.704. 1979. [122] D. Terzopoulos, "Regularization of inverse visual problems

[100] -, "Egomotion and relative depth map from optical flow," involving discontinuities," IEEE Trans. Pattern Anal. MachineBiological Cybernetics, vol. 36, pp. 87-102, 1980. Intell., vol. PAMI-8, no. 4, pp. 413-424, July 1986.

[1011 H. C. Longuet-Higgins and K. Prazdny, "The interpretation [123] A. Rougee, B. C. Levy, and A. S. Willsky, "Reconstruction ofof a moving retinal image," Proc. Roy. Soc. London, vol. B208, two-dimensional velocity fields as a linear estimation prob-pp. 385-397, 1980. lem," in Proc. 1st Int. Conf. Computer Vision, (London,

[102] A. M. Waxman, B. Kamgar-Parsi, and M. Subbarao, "Closed England), pp. 646-650, June 1987.form solutions to image flow equations," in Proceedings of [124] G. F. Poggio and T. Poggio, "The analysis of stereopsis," Ann.the First Conf. on Artificial Intelligence, (Denver, CO), pp. Rev. Neurosci., vol. 7, pp. 379-412, 1984.12-23, 1984. [125] D. Regan and K. I. Beverly, "Binocular and monocular stim-

[1031 A. M. Waxman and K. Wohn, "Image flow theory: A frame- uli for motion in depth: Changing-disparity and changing-work for 3-D inference from time-varying imagery," in size feed the same motion-in-depth stage," Vision Research,Advances in Computer Vision, C. Brown, Ed. Hillside, NJ: vol. 19, pp. 1331-1342, 1979.Erlbaum Publishers, 1987. [126] T. S. Huang and S. D. Blostein, "Robust algorithms for motion

[104] M. Subbarao, "Solution and uniqueness of image flow equa- estimation based on two sequential stereo image pairs," intions for rigid curved surfaces in motion," in Proc. 1st Int. Proceedings of IEEE Conf. on Computer Vision and PatternConf. Computer vision, (London, England), pp. 687-692, June Recognition, (San Francisco, CA), pp. 518-523, June 19-23,1987. 1985.

[105] -, "Interpretation of image motion fields: Rigid curved [127] Z. C. Lin, T. S. Huang, S. D. Blostein, H. Lee, and E. A. Mar-surfaces in motion," Tech. Rep. CAR-TR-199, Center for gerum, "Motion estimation from 3-D point sets with andAutomation Research, Univ. of Maryland, College Park, MD, without correspondences," in Proceedings of IEEE Conf. onApril. 1986. Computer Vision and Pattern Recognition, (Miami Beach, FL),

[106] S. Negahdaripour and B. K. P. Horn, "Determining 3-D pp. 194-201, June 22-26, 1986.motion of planar objects from image brightness measure- [128] Z. Lin, H. Lee, and T. S. Huang, "Finding 3-D point corre-ments," in Proceedings of the International Joint Conf. on spondences in motion estimation," in Proc. 8th Int. Conf.Artificial Intelligence, (Los Angeles, CA), pp. 898-901, Aug. Pattern Recognition, (Paris, France), pp. 303-305, Oct. 1986.18-23, 1985. (129] K. S. Arun, T. S. Huang, and S. D. Blostein, "Least squares

[107] G. Adiv, "Inherent ambiguities in recovering3-D motion and fitting of two 3-D point sets," IEEE Trans. Pattern Anal.structure from a noisy flow field," in Proc. oflEEE Computer Machine Intell., vol. PAMI-9, no. 5, pp. 698-700, Sept. 1987.Vision and Pattern Recognition Conf., (San Francisco, CA), [130] M. J. Magee and J. K. Aggarwal, "Determining motionpp. 70-77, June 1985. parameters using intensity guided range sensing," in Pro.

[108] T. D. Williams, "Depth from camera motion in a real world ceedings of the 7th International Conf. on Pattern Recogni.scene," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI- tion, (Montreal, Canada), pp. 538-541, 1984.2, no. 6, pp. 511-516, Nov. 1980. [131] 1. K. Aggarwal and M. J. Magee, "Determining motion

[109] D. T. Lawton, "Processing translational motion sequences," parameters using intensity guided range sensing," PatternComputer Vision, Graphics and Image Processing, vol. 22, Recognition, vol. 19, no. 2, pp. 169-180, 1986.pp. 116-144, 1983. [132] S. M. Kiang, R. J. Chou, and J. K Aggarwal, "Triangulation

[110] J. H. Rieger and D. T. Lawton, "Determining the instanta- errors in stereo algorithms," in IEEE Computer Vision Work-neous axis of translation from optic flow generated by arbi- shop, (Miami Beach, FL), pp. 72-78, Dec. 1-2, 1987.trary sensor motion," in Proc. ACM Interdisc. Workshop [133] J. Aloimonos and I. Rigoutsos, "Determining the 3-D motionMotion, (Toronto, Ont., Canada), pp. 33-41, 1983. of a rigid planar patch without correspondence, under per-

[111] K. Prazdny, "Determining the instantaneous direction of spective projection," in Proc. IEEE Computer Society Work-motion from optical flow generated by a curvilinearly mov- shop on Motion: Representation and Analysis, pp. 167-174,ing observer," Computer Graphics and Image Processing, May 1986.vol. 17, pp. 238-248, 1981. [134] Z.-Ch. Lin, H. Lee, and T. S. Huang, "A frequency domain

[112] A. R. Bruss and B. K. P. Horn, "Passive navigation," Com- algorithm for determining motion of a rigid object fromputer Vision, Graphics and Image Processing, vol. 21, no. 1, range data without correspondences," in Proc. IEEE Conf.pp. 3-20, Jan. 1983. on Computer Vision and Pattern Recognition, pp. 194-201,

[113] B. K. P. Horn and E. J. Weldon, "Computationally efficient 1986.methods for recovering translational motion," in Proceed- [135] M. R. M. Jenkin, "The stereopsis of time-varying images,"ings of the First International Conf. on Computer Vison, Tech. Rep. RBCV-TR-84-3, Dept. of Computer Science, Univ.(London, England), pp. 2-11, June 8-11, 1987. of Toronto, Ontario, Canada, Sept. 1984.

[114] S. Negahdaripour and B. K. P. Horn, "Direct passive navi- (136] R. Nevatia, "Depth Measurement by motion stereo," Com-gation," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI- puter Graphics and Image Processing, vol. 5, pp. 203-214,9, no. 1, pp. 168-176, Jan. 1987. 1976.

(115] T.-C. Chou and K. Kanatani, "Recovering 3-D rigid motions [137] K. M. Mutch, "Determining object translation informationwithout correspondence," in Proc. 1st Int. Conf. Computer using stereoscopic motion," IEEE Trans. Pattern Anal.Vision, (London, England), pp. 534-538, June 1987. Machine Intell., vol. PAMI-8, no. 6, pp. 751-755, Nov. 1986.

934 PROCEEDINGS OF THE IEEE, VOL. 76, NO. 8, AUGUST 1988

Page 21: REPORT DOCUMENTATION PAGE · scientists in this group include neurophysiologists, psy- examples are indicative of the diversity of applications chophysicists, and physicians. The

1138] G. Xu, S. Tsuji, and M. Asada, "A motion stereo method based versity of California, Berkeley, during 1969-1970. He has publishedon coarse-to-fine control strategy," IEEE Trans. Pattern Anal. numerous technical papers and several books, Notes on NonlinearMachine Intell., vol. PAMI-9, no. 2, pp. 332-336, Mar. 1987. Systems (1972), Nonlinear Systems: Stability Analysis (1977), Com-

[139] R. lain, S. L. Bartlett, and N. O'Brien, "Motion stere) using puter A!uhods in Image Analysis (1977), Digital Signal Processingego-motion complex logarithmic mapping," IEEE Trans. Pat- (1979), and Deconvolution of Seismic Data (1982). His currenttern Anal. Machine Intell., vol. PAMI-9, no. 3, pp. 356-369, research interests are image processing, computer vision, and par-May 1987. allel processing of images. He was co-Editor of the special issues

[140] A. Mitiche, "On combining stereopsis and kineopsis for on Digital Filtering ano Image Processing (IEEE TRANSACTIONSONCIR-

space perception," in Proceedings of the First Conf. on Arti- CULlS S SIEMS, March 1975) and on Motion and Time Varyingficial Intelligence, (Denver, CO). pp. 156-160, Dec. 1984. Imagery (IEEE TRANSACIiON' ON PATTERN ANALYSIS AND MACHINE INTEL-

[141] A. M. Waxman and S. Sinha, "Dynamic stereo: Passive rang- IiCENLL, November 1980) and editorof the two volume special issueing to moving objects from relative image flows," IEEE Trans. of Computer Vision, Graphics and Image Processing on MotionPattern Anal. Machine Intell., vol. PAMI-8, no. 4, pp. 406-412, (CVGIP, January and February 1983). Currently he is an AssociateJuly 1986. Editor of the journals Pattern Recognition, Image and Vision Com-

[ 142] H.-H. Nagel, "Dynamic stereo vision in a robot feedback loop puting, and Computer Vision, Graphics and Image Processing andbased on the evaluation of multiple interconnected dis- IEEE EXPERT. Further, he is a member of the Editorial board of IEEEplacement vector fields," in Proceedings of the Interna- Press and the Editor of the IEEE Selected Reprint Series. Hewas thetional Symp. on Robotics Research, pp. 200-206, 198S. General Chairman for the IEEE Computer Society Conference on

[143] W. Richards, "Structure from stereo and motion," /. Opt. Pattern Recognition and Image Processing, Dallas, TX, 1981, andSoc. Amer., vol. 2, pp. 343-349, Feb. 1985. was the Program Chairman for the First Conference on Artificial

[144] A. M. Waxman and J. H. Duncan, "Binocular image flows: Intelligence Applications sponsored by IEEE Computer SocietyandSteps toward stereo-motion fusion," IEEE Trans. Pattern Anal. AAAI held in Denver, CO, 1984. In June 1987, he began his termMachine Intell., vol. PAMI-8, no. 6, pp. 715-729, Nov. 1986. as the Chairman of the Pattern Analysis Machine IntelligenceTech-

[145] Third International Workshop on Time-Varying Processing nical Committee of the IEEE Computer Society.and Moving Object Recognition, Florence, Italy, May 29-31, Dr. Aggarwal is an active member of IEEE, IEEE Computer Society,1989, ACM, AAAI, The International Society for Optical Engineering, Pat-

[146] IEEE Computer Society Workshop on Visual Motion, Irvine, tern Recognition Society, and Eta Kappa Nu.CA, Mar. 20-22, 1989.

N. Nandhakumar (Member, IEEE) receivedJ. K. Aggarwal (Fellow, IEEE) received the B.S. the B.E. (Hons) degree in electronics anddegree in mathematics and physics from communication engineering from thethe University of Bombay, in 1956, the B.Eng. P.S.G. College of Technology, University ofdegree from the University of Liverpool, Madras, India, in 1981, the M.S.E. degree inEngland, in 1960, and the M.S. and Ph.D. . computer, information and control engi-degrees from the University of Illinois, neering from the University of Michigan,Urbana, in 1961 and 1964, respectively. Ann Aror, in 1983, and the Ph.D. degree

He joined The University of Texas in 1964 in electrical engineering from The Univer-as an Assistant Professor and has since held sity of Texas at Austin in 1987.positions as Associate Professor (1968) and He is currently a Research Associate atProfessor (1972). Currently, he is the John the Computr and 'vision Research Center at The University of

J. McKetta Energy Professor of Electrical and Computer Engineer- Texas at Austin and conducts research in the integration of diverseing and Computer Sciences at The University of Texas at Austin. sensing modalities for computer vision. He is an active memberFurther, he was a Visiting Assistant Professor at Brown University, of the IEEE, and as a student he held offices in the student chaptersProvidence, RI (1968), and a Visiting Associate Professor at the Uni- of the IEEE and the IE.

AGGARWAL AND NANDHAKUMAR: COMPUTATION OF MOTION OF IMAGES 935