evaluat

Matching techniques to compute image motionA. Giachetti*

CRS4 Sesta Strada Ovest, Zora Ind. Macchiareddu, 09010 Uta (Ca), ItalyReceived 18 March 1999; received in revised form 7 June 1999; accepted 25 June 1999

Abstract

This paper describes a thorough analysis of the pattern matching techniques used to compute image motion from a sequence of two or moreimages. Several correlation/distance measures are tested, and problems in displacement estimation are investigated. As a byproduct of thisanalysis, several novel techniques are presented which improve the accuracy of flow vector estimation and reduce the computational cost byusing filters, multi-scale approach and mask sub-sampling. Further, new algorithms to obtain a sub-pixel accuracy of the flow are proposed. Alarge amount of experimental tests have been performed to compare all the techniques proposed, in order to understand which are the mostuseful for practical applications, and the results obtained are very accurate, showing that correlation-based flow computation is suitable forpractical and real-time applications. q 2000 Elsevier Science Ltd. All rights reserved.

Keywords: Optical flow; Correlation; Distance; Computational cost; Accuracy

1. Introduction

Window-matching or correlation-based techniques arethe most intuitive and perhaps also the most widely appliedtechniques to compute the optical flow from an imagesequence, i.e. to estimate the 2D motion projected on theimage plane by the objects moving in the 3D scene [16].Optical flow estimation has many practical and industrialapplications, i.e. for object tracking, assisted driving orsurveillance systems, obstacle detection, image stabilisationor video compression [711]. In spite of this fact, few worksanalysing the performances and the possible enhancementsof these algorithms have been presented [1,2,6] so that amore detailed analysis of this simple and widely used opti-cal flow technique seemed to us necessary. The aim of thispaper is to give a clear overview of window matching algo-rithms and to suggest new solutions to improve on theirshortcomings (such as the computational cost, the pixelprecision, and so on).

The paper is organised as follows: Section 2 gives anoverview of correlation-based techniques discussing advan-tages and drawbacks, Section 3 introduces the distance orsimilarity measures we applied to our algorithms. Section 4discusses the matching error due to high frequencies andsearch space quantisation. Section 5 introduces several

techniques in order to have the best results in matching,reducing the complexity and increasing the accuracy obtain-ing also a sub-pixel motion estimation. Section 6 presentsthe experimental results, with comparisons of algorithms onwell-known test image sequences.

2. Overview: advantages and drawbacks

Correlation-based methods are based on the analysis ofthe gray level pattern around the point of interest and on thesearch for the most similar pattern in the successive image.In a few words, having defined a window W~x around thepoint ~x; we consider similar windows W 0x 1 i; y 1 jshifted by the possible integer values in pixels in a searchspace S composed by the i, j such as 2D , i , D and2D , j , D: The optical flow, i.e. the estimated imagedisplacement is taken as the shift corresponding to the mini-mum of a distance function (or maximum of a correlationmeasure) between the intensity pattern in the two corre-sponding windows:

f W ;W 0i; j 1The basic implicit assumptions are that the gray levelpattern is approximately constant between successiveframes (no perspective effects) and that local texturecontains sufficient unambiguous information.

Many applications of similar algorithms are found in theliterature, but only few works have investigated how toobtain the best results from them. In the well-known optical

Image and Vision Computing 18 (2000) 247260

0262-8856/00/$ - see front matter q 2000 Elsevier Science Ltd. All rights reserved.PII: S0262-8856(99)00018-9

* CRS4 Sesta Strada Ovest, Zona Ind. Macchiareddu, 09010 Uta (Ca),Italy. Tel.: 1 39-070-2796231.

E-mail address: [email protected] (A. Giachetti).

www.elsevier.com/locate/imavis

flow technique compared by Barron et al. [2] only twoamong the algorithms analysed were based on correlation,specifically on the comparison of image windows with thesum of squared differences (SSD) measure. The first, byAnandan [1] reaches a sub-pixel precision by locallyapproximating the difference function with a quadraticsurface, and the other, by Singh [6], reaches the same goalby performing a weighted sum of displacements.

A comparison among several correlation/distancemeasure albeit limited to synthetic images has beenproposed by Aschwanden and Guggenbuhl [12].

Optical flow estimators based on correlation are lesssensitive to noise than derivative based ones. Usually theyhave better performances if the texture is not relevant and inthe case of large inter-frame displacements causing thealiasing problem [13] in the derivative estimation. Themain drawbacks are to be found in the computational weightand in the quantisation of computed values. In the followingsections we will discuss methods to partially overcomethese problems. First of all we analysed different similaritymeasures that can be used.

3. Distancesimilarity measures

Many ways of measuring difference or similarity betweenthe gray-level pattern can be used. In our work we comparesquared windows of N N and compute motion between a

window centred in (x; y in the image I1 and a windowshifted by i; j in the image I2. The most used distancemeasures are reported in Table 1. The widely used sum ofabsolute differences (SAD) and SSD can be modified toconsider the effect of global gray-level variations, settingthe average gray level difference equal to 0 (ZSSD, ZSAD)or locally scaling the intensity (LSAD, LSSD).

Distance minimisation can be replaced by the maximisa-tion of a correlation measure (see Table 2). The standardcross-correlation (CC) is too sensitive to noise and is usuallyreplaced by the normalised one (NCC) or by the zero-meannormalised version (ZNCC).

These measures are all based on computations made onthe local gray level values. An analysis of their robustnessagainst several types of noise and image distortion onsynthetic images can be found in [12]. Another possibleway to perform the comparison is to reduce the amount ofinformation by extracting local features of the images andlimiting the comparison to those features. Some authorsproposed to match extracted edges using the Hammingdistance or the Hausdorff fraction [14] as differencemeasure. The Hamming distance is simply the number ofbits in the opposite state (0/1). The Hausdorff fraction, usedby Huttenlocher and others [14] to compare binary maps, isthe faction of pixels in the state 1 in the original patternthat have distance less than a threshold from a pixel in thesame state in the shifted patch of the successive image. Inour experiments the threshold was fixed to the value of1 pixel. Zabin and Woodfill [15] have introduced twoimage transforms called Rank transform and Census trans-form to be performed before the comparison. In the first casethey compare with the SSD distance the transformed imagesgiven by:

R~x i~x 0 [ N~x : I~x 0 , I~xi 2

i.e. for each location, the number of neighbouring with graylevel smaller than the central value.

The Census transform consists of defining for each pixel abinary matrix with the value 1 in the neighbouring points

A. Giachetti / Image and Vision Computing 18 (2000) 247260248

Table 1Definitions of the most common difference measures for squared pattern ofpixels

SAD~x; ~d PN=2i;j2 N=2 uI1x 1 i; y 1 j2 I2x 1 i 1 dx ; y 1 j 1 dyuSSD~x; ~d PN=2i;j2 N=2 I1x 1 i; y 1 j2 I2x 1 i 1 dx ; y 1 j 1 dy2ZSAD~x; ~d PN=2i;j2 N=2 uI1x 1 i; y 1 j2 1 2 I2x 1 i 1 dx ; y 1 j 1 dy2 2 uZSSD~x; ~d PN=2i;j2 N=2 I1x 1 i; y 1 j2 1 2 I2x 1 i 1 dx ; y 1 j 1 dy2 22LSAD~x; ~d PN=2i;j2 N=2 uI1x 1 i; y 1 j2 1

2I2x 1 i 1 dx; y 1 j 1 dyu

LSSD~x; ~d PN=2i;j2 N=2 I1x 1 i; y 1 j2 12

I2x 1 i 1 dx; y 1 j 1 dy2

Table 2Definitions of the most common correlation measures for squared pattern of pixels

CC~x; ~dXN=2

i;j2 N=2I1x 1 i; y 1 jI2x 1 i 1 dx; y 1 j 1 dy

NCC~x; ~d XN=2I;j2 N=2

I1x 1 i; y 1 jI2x 1 i 1 dx; y 1 j 1 dyXN=2i;j2 N=2

I1x 1 i; y 1 j2XN=2

i;j2 N=2I2x 1 i 1 dx; y 1 j 1 dy2

vuut

ZNCC~x; ~d XN=2i;j2 N=2

I1x 1 i; y 1 j2 1I2x 1 i 1 dx; y 1 j 1 dy2 2XN=2i;j2 N=2

I1x 1 i; y 1 j2 2 1XN=2

i;j2 N=2I2x 1 i 1 dx; y 1 j 1 dy2 2 2

vuut

where the gray level is above the central value and 0otherwise. The local matrixes are then compared by usingthe Hamming distance.

These techniques reduce the amount of information of thepatches to be compared and this means that the resultsobtained are good only for very simple images.

4. High frequencies and quantisation

Even if correlation-based techniques are not affected bythe aliasing problem as differential ones, signal quantisationintroduces error in flow computation due to high frequen-cies. If the frequency of the signal has the same order ofmagnitude as the sampling frequency and the displacementsto be computed are not exactly integer (i.e. a multiple of thesampling step), correlation may lead to completely wrong

results as well. Let us show it with a simple example: weconsider a 1D sinusoidal pattern translating, as in Fig. 1. Thesimilarity measure SAD is given by:Xx 1 Mix 2 M

usin vx 1 i2 sin vx 1 idx 1 v 2 ddxu 3

where M is the mask half size, v the speed and dx thesampling step and d the tried displacement. Applying simpletrigonometrical formulas, we obtain:X1 Mi2 M

usin vx 1 idx1 2 cos vv 2 ddx

2cos vx 1 idxsin vv 2 ddxu 4where v 2 ddx represents the difference between the truevalue of the image motion and the integer tentative value.It is evident that, if vdx is not small, the difference can berelevant if ddx is close to v and negligible ifvv 2 ddx , kp:

It is therefore useful to filter the images before thedistance computation. We apply usually a Gaussian filterwith s 1:5: This is sufficient to avoid errors and to havean estimated value close to the real displacement if thesignal has low-frequency components. We can show thisfact more clearly with another example.

Consider a 1D signal such as that in Fig. 2a, with roughlythe superposition of a low frequency and a frequency higherthan the reciprocal of the sampling step. If the profile ismoved of a half sampling step and the signal is re-sampled,then we have the sample value as represented in Fig. 2b. Ifwe compute the SAD distance for tentative displacements inthe range (22,2) the distance is minimum for a displace-ment d 22 (see Fig. 2c), with a completely wrong motionestimate. If the sampled signal is filtered with a simple

A. Giachetti / Image and Vision Computing 18 (2000) 247260 249

Fig. 1. Correlation on a sinusoidal pattern in 1D: signal quantisation createsproblems if the sampling frequency is small compared with the frequencyof the signal.

Fig. 2. (a) and (b): sampling at t and t 1 1 of a superposition of a low and an high frequency translating to the left. (c) The shift corresponding to the minimumof the SAD distance do not approximate as expected the real displacement.

3-point low-pass mask like (0.25, 0.5, 0.25), then thesamples to be compared are now those as represented inFig. 3a and b. The SAD distances measured are those asin Fig. 3c, and the minimum values correspond to theinteger displacements closest to the real value.

5. Improving the method

The previous sections have shown several features ofcorrelation algorithms also showing some problems of themethod; in this section we will introduce several algorithmsto solve these problems and to improve the algorithmperformances.

5.1. Optimal window size

What is the ideal size of the windows used to perform thematching? If small windows are used, the amount of infor-mation inside the window is small and the estimate is notreliable. If windows are too large the hypothesis of negli-gible deformations of the pattern inside the window fails

and the estimate can be wrong. Further, the computationalcomplexity is greatly increased. The ideal size depends onthe texture inside. For our experiments we consideredwindows of 25 25 pixels, a good tradeoff as shown byour experiments. However, following an idea alreadyapplied in the case of stereo matching [16], we consideredthe opportunity of using an adaptive window size, but thealgorithm we have implemented, enlarging the window sizeif the information inside is small, has not given good results,because a small accuracy improvement required a relevantspeed reduction. As the window information measure, weused the determinant of the first-order derivatives matrixinside the window, divided by the pixel number. The algo-rithm starts from a fixed 11 11 dimension then increaseit until a threshold is reached or a maximum value isobtained. Better accuracy results would be obtained byusing an iterative solution involving also flow regularityin the window adaptive algorithm, but this would makethe algorithm slower.

5.2. Complexity reduction: fast minimum/maximum searchCorrelation-based optical flow algorithms are extremely

complex and time-consuming. They require repeatedcomparisons, each one needing N2 operations for each velo-city value in the search space. If the distance (correlation)used for the comparison is obtained from a certain numberof operations for each mask point, the complexity as a func-tion of these operations is N22M 1 12 for each pointwhere the flow is computed and execution time is higheven on fast machines.

It is possible to introduce techniques capable of reducingsensitively the computation times, even if they can reducethe accuracy of the estimates (Fig. 4).


Fig. 3. (a) and (b) The signals of Fig. 2(a) and (b) filtered with a low pass 3-point mask. (c) The SAD distances, minima near the correct value.

Fig. 4. Reducing the complexity of a factor 16 by window sub-samplingdoes not affect the accuracy of the estimate.

In order to reduce the complexity of the method for asingle point we propose the following solutions:

Reducing the tentative displacements (window sub-sampling).

Changing the search strategy. Replacing arithmetic operations with the use of look-up

tables.

If we want to compute a dense flow, there are other possi-bilities to reduce the complexity:

Storing partial results to avoid repeated calculations. Computing the flow at reduced densities and then filling

the gaps in the flow field computing in the other pointsthe correlations for a reduced search space limited tovalues close to those obtained in the neighbouring points.

Here are a few details on the methods.

5.2.1. Mask sub-samplingTests performed on several images have shown that

windows at least 15 pixels wide are necessary to havegood results in matching, but have also shown that thereis the possibility of using a small sub-set of the windowpoints to compute differences without affecting too muchthe results. If we use, for example 25 25 windows, noerror is introduced by computing differences for SSDsampling the windows with a step 4 (i.e. calculating thevalue only for 1 pixel every 4 4), with complexity reducedof a factor 16. This is due to the strong correlation betweengray level in neighbouring points, especially after the spatialfiltering.

In the general case, if the sub-sampling rate is s thecomplexity changes from N22M 1 12 for each displace-ment estimate to N22M 1 12=s2:

5.2.2. Fast minimum/maximum searchThe search for the minimum of the distance can be

speeded up with different search strategy. A simple method,that is effective only in very simple cases is the 1D1Dmethod proposed by Ancona and Poggio [7] which searchesfor the minimum first moving the window in one directionand then assuming that the motion component in that

direction is that corresponding to this minimum. The proce-dure is then repeated independently for the other direction(Fig. 5).

It is clear that this technique will provide good resultsonly if the minimum of the distance function is well defined.The complexity is, of course, drastically reduced and goesfrom the N22M 1 12 operations per point of the full searchto N222M 1 1:

5.2.3. Coarse-to-find minimisationA better way to reduce complexity consists of introducing

a coarse-to-fine minimisation. The search space is firstquantised with a large step (2K pixels), and when the mini-mum is found at this resolution a new search with step 2K21is performed around the found minimum, and the procedureis then repeated. When step 1 is reached, a vector with pixelprecision is obtained, with a complexity of onlyN=2K2 1 8K2M 1 12 steps.

5.2.4. Look-up tablesIf the distance function is a sum of differences or products

of the gray level of two points, it is convenient to avoid thecomputation by generating and storing a look-up table asso-ciating the result of the operation with the values of the twogray levels. If the SSD correlation is used and the number ofgray level is 256, the following table is generated:

tablei; j i 2 j2 5and the program compute the distance as:

SSD~x; ~d XT=2

i;j2 N=2tableI1x 1 i; y 1 j; I2x 1 i 1 dx; y 1 j 1 dy

6The time saved depends on the operations replaced by thetable access. The problem of the method is that it requiresthe allocation of a large amount of memory for the table.

5.2.5. Increasing density algorithmIf the user wants to estimate a dense flow field, it is

possible to exploit the linear dependence of the estimatesof neighbouring points to reduce the complexity. Tocompute a dense flow on a X Y image region the complex-ity is XYN22M 1 12 times the basic operation. But tocompute the flow in neighbouring points, many of theseoperations are repeated, so there is a redundancy and esti-mates of neighbouring points are strongly correlated. It isfound that there is no accuracy loss in computing the flowonly at a reduced density, X=2S Y=2S and then addingthe missing estimates at the immediately finer resolution,X=2S21 Y=2S21 limiting the search space to the range ofthe displacements values computed in the neighbouringpoint at the coarser resolution. The complexity reduction


Fig. 5. 2D and 1D1D minimisation.

depends on the flow variations, but is usually relevant due tostrong flow continuity.

5.2.6. Avoiding repeated operationsAs suggested before, when the flow is computed at a

density such as overlapping windows are used to computethe flow in different points, some operations are repeated ifthe usual algorithm is applied to each pixel. It is howeverpossible to avoid this waste of time with a fast algorithmeliminating all the repeated operations by computing partialsums for each tentative displacement and storing them inmemory. The theoretical complexity is thus drasticallyreduced. With the correlation estimate repeated for eachpoint, we usually have

2d 1 12N22M 1 12

calls to the LUT and an equal number of additions. The fastalgorithm works as follows: first partial sums over horizon-tal segment of the windows x-size are computed simply byadding the following pixel and subtracting the previous one.The second step consists in repeating the procedure byadding the partial sums vertically over segments of thewindow size 2M 1 1:

These two steps are repeated for each tentative displace-ment and all the distances are then computed with

2d 1 12N2

calls to the LUT and

2d 1 122N2M 1 11 2N 2 1additions or subtractions. When N q M the algorithmshould be faster of a factor that is of the magnitude of thesquared window size that is usually about 102. In the experi-ments the time saving is not so relevant (it is of a factor10 2 20 for N 256 and M 12) due to the memorymanagement of the elaborator.

5.3. Sub-pixel precision

The motion to be estimated is, for most image sequences,small and not integer. On the contrary, the motion estimatedwith correlation, is quantised. It is therefore useful to add tothe algorithms some procedures to obtain a precision notlimited by the pixel dimension. Techniques to obtain thisresult are used in the correlation algorithms tested in thework of Barron et al. ([2]). We propose similar and newmethods that we have tested in our experiments.

5.3.1. Anandans algorithmAnandan [1] used SSD correlation to compute the flow

with a multi-scale approach and very small windows3 3. He then approximated the surface of the distancefunction with a quadratic surface, generating a potentialwhere the continuous SSD approximation is added toanother term depending on velocity smoothness. This

approach requires then an iterative minimisation of thepotential and therefore is computationally heavy.

5.3.2. Weighted average of displacementsAnother possibility consists of computing the non-integer

displacement as an averaged sum of the displacements forwhich the distance measure has been computed, using thedistance values to calculate the weights. Singhs algorithmuse the SSD correlation on 7 7 windows doing the corre-lation on three consecutive images: im21; im0; im1minimizing the distance:

SSD~x; ~d; im21; im0; im1

SSD~x;2~d; im21; im01 SSD~x; ~d; im0; im17

Then a weight function is build:

R~x; ~d e2kSSD~x;~d 8where k 2ln(0.95)/minimum(SSD(im(21),im(0),im(1)))and the sub-pixel displacement v~x u~x; v~x is givenby:

u~x P

R~ddxPR~d 9

v~x P

R~ddyPR~d 10

This method like the other similar ones give good results,even if it is not theoretically well founded and is computa-tionally heavy. We implemented a simplified technique ofthis kind by simply performing a similar weighted sum ofthe displacements in a 1 1 neighbourhood N of the onecorresponding to the minimum SSD:

u~x

X~d[N

R~ddxX~d[N

R~d 11

v~x

X~d[N

R~ddyX~d[N

R~d 12

5.3.3. InterpolationAnother typical method to obtain a sub-pixel precision is

to interpolate the signal in order to have an image value alsofor non-integer pixel positions. We have developed an algo-rithm that introduces gray level values at non-integer coor-dinates interpolating neighbouring values and then correctthe best integer value searching for the best match of thefiner image around that value.


5.3.4. The mixed algorithmThe last method we propose is completely new and

consists of a combination of the classical integer matchingand the LucasKanade differential technique. It consists ofcomputing the integer part of the motion vector with thecorrelation method, and then compute corrections to thisvalue by using the differential method on the locally warpedsequence obtained by shifting the neighbourhood of thepoint in the previous and successive image of the integermotion computed. In detail, let us call ~V~x U~x;V~xthe computed integer vector and W~x a small window (e.g.of 9 9 pixels) around the considered point ~x: The non-integer correction is computed by solving the over-constrained system:

Exi; jcx~x1 Eyi; jcy~x1 E 0ti; j 0 i; j [ W 13where E 0t is the shifted derivative:

E 0t Ex 1 U~x; y 1 V~x; t 1 12 Ex 2 U~x; y 2 V~x; t 2 1214

If the least square solution is considered reliable, i.e. if theresidual of the least square fit

QW~x X

i;j[W~xEti; j1 Exi; jcx~x; t

1 Eyi; jcy~x; t2=NW 15(where N is the number of pixels inside the window) is low,the integer vector is corrected and the best velocity estimatebecomes:

~v ~V 1 ~c 16This method is effective because the differential technique isfast and gives good estimates for corrections that are of lessthan one pixel (differential techniques are not reliable forlarge inter-frame motions because of aliasing [13]).

5.4. Post-processing

Other techniques of post-processing can be useful toimprove the flow accuracy. If a reliable confidence measureis provided by the optical flow algorithm, a non-linear filter-ing able to correct bad estimates can be introduced. As aconfidence measure for correlation we consider the ratiobetween the distance value (or the reciprocal of the correla-tion value) and the average distance value in the searchspace S~x :

Q~x mind[S~x

d

~d17

Using this function it is possible to implement for example amulti-window filter ([13,17]) or regularising filters perform-ing weighted averages and possibly preserving the velocityedges ([13]).

6. Experimental results

To test the algorithms, we computed optical flows onsynthetic or calibrated image sequences with the true displa-cements known at every pixel location. We measured theaverage differences between the computed flow ~v and thetrue motion ~v 0 using the angular distance introduced byBarron et al. [2]:

dist~v; ~v 0 arccos uu0 1 vv 0 1 1

u~vu2 1 1u~v 0u2 1 1q

0B@1CA 18

6.1. Comparison between similarity measures

Even if matching algorithms based on different similarity/distance measures are widely used both for motion estimateand disparity computation, few works analysing theirperformances can be found in literature. Further, those


Table 3Comparison between distance measure over the MJ sequence. Legends: SSD, sum of squared distances; ZSSD, zero-mean sum of squared distances; LSSD,locally scaled sum of squared distances; SAD, sum of absolute distances; ZSAD, zero-mean sum of absolute distances; LSAD, locally scaled sum of absolutedistances; NCC, normalized cross-correlation; ZNCC, zero-mean normalized cross-correlation. RANK rank transform; CENSUS, census transform; EDG,(HAMMING) Hamming distance on binary edge image; EDG(HAUS;.D.), Haussdorff distance on binary edge image

Measure Average error Standard deviation Time/time (RANK)

SSD 6.0 20.5 2.5ZSSD 6.1 20.6 4.1LSSD 6.1 20.5 4.5SAD 3.6 17.6 2.5ZSAD 4.4 17.3 4.5LSAD 4.3 17.9 4.8NCC 6.1 20.5 2.7ZNCC 6.2 20.6 4.9RANK 4.1 19.0 1.0CENSUS 16.8 24.7 4.6EDG(HAMMING) 3.7 17.7 1.9EDG(HAUSD.) 3.7 17.7 1.9

works often present results obtained only on simple orsynthetic images. As a first experimental test, we have there-fore compared the accuracy of the displacements estimatedwith different measures on several image sequences. In thecase of rich texture and integer displacements, the resultsare, as expected, accurate using all the considered measures.The only interesting comparison can be carried out on theexecution times. A good analysis of measure performancesadding controlled noise to similar images is presented in[15]. However, real images are corrupted by other noise

sources and present other problems due to perspectiveeffects and motion discontinuities.

In order to verify the execution time and at the same timeanalyse the accuracy near discontinuities we have generateda synthetic image sequence with integer inter-frame displa-cements. The MJ sequence, represents the superpositionof a textured circle over a differently textured background.The circle translates with constant speed (2,3) pixel/frame,while the backgrounds translates with speed (21,0) pixel/frame. Poor texture and discontinuities create problemseven if there is no added noise (Table 3, Fig. 6).

In this case no speeding up algorithms are applied andtime values (relative) are approximated. All the distances


Fig. 6. The MJ sequence: (a) central frame with the true motion super-imposed; (b): image motion estimated with SAD correlation, the classicalalgorithm giving the best results; (c) edge map extracted from the centralimage; (d) optical flow computed on the edge images with the Hammingdistance (only one arrow every 12 12 is shown for clear visualization.Zero length arrows are not represented).

Table 4Comparison between different correlation/distance measure on the 256 256 Marbled Block sequence (see caption of Fig. 1 for the distance defini-tions)

Distance Average error Standard deviation

SSD 20.7 10.8ZSSD 20.7 11.1LSSD 20.7 11.1SAD 20.6 10.7ZSAD 20.2 10.7LSAD 20.2 10.7NCC 20.7 11.0ZNCC 20.9 11.1RANK 21.4 11.7CENS 25.2 26.2EDG(HAMMING) 41.0 28.2EDG(HAUSD) 41.0 28.2

Table 5Comparison between different correlation/distance measures on the Yose-mite Valley sequence (see caption of Fig. 1 for the distance definitions)

Distance Average error Standard deviation

SAD 12.6 10.1ZSAD 10.5 9.3LSAD 10.3 10.1SSD 12.7 9.3ZSSD 10.9 8.3LSSD 10.5 9.0NCC 10.4 8.8ZNCC 10.0 8.5RANK 18.6 19.1CENS 25.2 26.2

Fig. 7. The main difference between classical measures is found whenthe global variations of brightness are present: (a) SSD cannot provide goodresults for the sky of the Yosemite Valley sequence; (b) NCC gives acorrect estimate also in that region (see Table 5. Only one arrow every 8 8 is shown for clear visualization. Zero length arrows are not represented).

provide good results with the exception of a few points nearthe motion discontinuity. Algorithms using reduced infor-mation are accurate too and as expected faster. But when thesequence becomes more realistic, these last techniques fail.The Reduced Marbled Block sequence represents a realmotion of objects with a rich texture. The results obtainedare shown in Table 4. In this case algorithms based onreduced information are not accurate. The best one, basedon the Rank Transform, provides results that are worse thanthose obtained with the worst classical method.

Yosemite Valley, is a synthetic sequence widely usedto analyse the accuracy of optical flow estimators overrealistic images. In fact it presents many problems such asperspective effects, non-integer displacements and globalvariations of brightness. Table 5 shows the results obtainedcomputing the flow at a reduced density (1 pixel every

4 4) with 25 25 masks, sub-sampled by a factor 4(Figs. 7 and 8).

We can conclude that, for practical applications, standardmeasures based on gray levels (SAD, SSD, ZSAD, ZSSD,LSAD, LSSD, NCC, ZNCC) are the best choice and theirperformance are similar. Only in the case of global bright-ness variations as in the sky of the Yosemite Valley Sequen-tially, the performance of the non-normalised measuresbecomes bad. Algorithms based on normalised measures,on the contrary, require the addition of a large amount ofoperations and are therefore slower. SAD and SSD, depend-ing only on local gray level values, can be computed moreefficiently by using look-up tables.

6.2. Window size

Table 6 shows the results obtained on the YosemiteValley Sequence changing the window size and keepingthe other parameters fixed (SSD distance, Gaussian filteringwith s 1:5; density 4). We compared the average angulardistances and the execution times. It seems that a windowsize of 25 25 pixels is a good tradeoff between accuracyand velocity. Where not explicitly indicated otherwise wehave always used windows of this size.

The window size can be also made adaptive in order touse more pixels where the local information is poor and lesswhere it is rich. The results we obtained, however, do notseem to be so good to compensate for the increased compu-tation time. We used the determinant of the matrix of thegray level derivatives as a measure of the local information.Starting from an initial window size of 15 15; weincreased the size of the window until the informationmeasure reached a previously fixed threshold. The accuracyobtained on our test images was not better than that obtainedwith the fixed 21 21 windows, but the computation timewas higher (Table 7).

6.3. Image filteringIn order to verify the improvements in flow accuracy due

to image filtering before processing, let us analyse theresults obtained on the same sequence keeping the otherparameters fixed (25 25 mask, sub-sampled with step 4,density 4, search space (24,4), SSD distance) changing onlythe value of s in the Gaussian filter (Table 8).

We therefore used s 1:5 as a default choice, a valuethat seems to give optimal results.

6.4. Speeding up the computation

6.4.1. Mask sub-samplingMask sub-sampling has already been introduced in the

previous section and it was stated that it does not stronglyaffect the accuracy. We now demonstrate this fact byanalysing the quality deterioration as a function of thesub-sampling step. Fixing the other parameters, wecomputed the difference between true and estimated


Fig. 8. (a) and (b): flows obtained with 9 9 and 41 41 windows (SSD).In both cases the accuracy is lower than the one obtained with 25 25windows (see Table 6. Only one arrow every 12 12 is shown for clearvisualization. Zero length arrows are not represented).

Table 6Comparison of results obtained on the Yoesmite Valley sequence usingdifferently sized windows

Window size Average error Standard deviation Time/time (9)

9 9 17.58 19.05 1.015 15 13.09 13.58 2.621 21 11.80 7.89 4.225 25 11.36 7.67 5.733 33 11.79 7.61 10.141 41 12.51 9.07 13.6

displacements on the Yosemite sequence 25 25; Gaus-sian filtering with s 1:5; density 4) changing the samplingrate of the windows. The average errors obtained are inTable 9. It is evident that the computation of the bestmatch can be performed in a faster way by computing thedistance only for a sub-set of the mask points: the reductionof the operations of a factory 16 do not affect the accuracy ofthe results.

6.4.2. Fast minimisationStill using ZNCC, we tested the effectiveness of our fast

search strategies evaluating the average error introduced asa function of the time saving, computed simply as the aver-age time of the program run on the same hardware. For theYosemite Valley sequence, the 1D1D method, givinggood results for every simple images [7] introduces a toolarge error, while the coarse to fine search strategy is effec-tive in reducing the time without introducing a large error.

Table 10 includes the results (SSD, s 1:5; 25 25mask, sub-sampled with step 4, density 4).

6.4.3. Dense flows: multi-resolutionThe effectiveness of the multi-scale minimisation

depends on the entity of the local variations of the flow.Discontinuities, however, are usually found at a fewimage locations so the time saving is considerable. Table11represents the time decrease as a function of the numberof scales used for the minimisation, using SSD distance,25 25 windows sub-sampled with step 4, final density ofthe flow equal to 1, Gaussian filtering of the images withs 1:5:

The average error on the flow does not change at all, butthe time is drastically reduced.

6.5. Look-up tables-repeated operations

The replacement of the computation of the squared differ-ences of SSD with the retrieval of a look-up table valuecauses, for 25 25 windows sub-sampled of a factor 4 atime saving of a factor 2.5. When the algorithm that elim-inates all the repeated operations is introduced, the time isreduced by a factor 15 for a 256 256 image and a 25 25not sub-sampled window. Of course, in this case there is noloss in accuracy.

The time saving is not as relevant as expected from thetheoretical analysis because of memory management.

6.6. Non-integer correction

Finally, we tested the performance of the three algorithmsto refine the precision of the flow to non-integer valuespresented in Section 6. We used the Barron angular distanceto compare the performance of weighted sum, interpolationand differential correction. In Table 12 are reported theresults obtained on the Translating Tree and DivergingTree sequences. Flow density is always equal to100%.The differential algorithm gives the best results both inaccuracy and time (Figs. 9 and 10).

In the case of the Yosemite Valley we used the NCCdistance instead of the usual SSD to have better perfor-mance for the sky region where the global brightnesschanges. The results are reported in the Table 13.

The accuracy obtained is very good, especially consider-ing that the flow density is equal to 100% and that thesequence used includes the clouds.


Table 7The adaptive window algorithm we tested did not yield good result

Window Average error Standard deviation Time/time (15)

Variable 12.8 13.5 3.215 15 13.1 13.6 1.021 21 11.8 7.9 4.2

Table 8Changes in accuracy due to variations on the standard deviation of theGaussian filter used for pre-processing. The values are the average errorson estimates realized on the Yosemite Valley Sequence

s Average error Standard deviation

0 14.9 14.90.5 14.8 14.41.0 13.3 11.71.5 12.8 10.12.0 12.8 10.02.5 13.8 11.7

Table 9Effect of mask sub-sampling on the average precision an the computationalspeed

Step Average error Standard deviation Time/time (8)

8 14.9 14.5 16 11.9 7.8 1.24 11.7 7.8 1.83 11.8 8.1 2.52 11.3 7.7 4.71 11.4 7.7 17.2

Table 10Results obtained with different search strategies: the 1D1D by Anconaand Poggio provides bad results. A 2D multi-grid strategy is slightly slowerbut much more precise

Algorithm Average error Standard deviation Time/time (1D1D)

1D1D 30.2 28.0 13 scale 15.6 18.2 1.42 scale 13.4 10.9 1.8Full 12.8 10.1 3.2

6.7. Real world sequences

In order to show the robustness of the proposed algo-rithms, we applied the multi-scale, corrected and fast flowestimation to the real world. We have chosen exampleswhere usual differential algorithms perform badly due tonoise or large displacements. Fig. 11 shows an imagefrom a sequence taken by a camera mounted on a car. Theoptical flow superimposed to the image is computed on 32 32 window with two resolutions and differential correction.The flow vectors computed are often very good even if thetexture is poor. The computed flow can be effectively usedto estimate the car speed ant to detect obstacles and othervehicles on the road, as pointed out in [18], and this meansthat the estimate of the optical flow is precise. With the sameparameters, we have computed the flow on a sequence ofMETEOSAT images. The result shown in Fig. 12 super-imposed to the corresponding image, show a good beha-viour and is possible to think of applications of thesealgorithms in weather forecasting, computing the futureposition and deformation of clouds from the flow valuesin the past. Also on a very noisy ultrasound medical

image it is possible to have a good estimation of the opticalflow. Fig. 13 shows the flow computed on a sequence show-ing the left ventricle in the diastolic phase. The optical flowseems reasonable, and it is not surprising therefore that weused the fast correlation algorithm to help the contour track-ing of the left ventricle described in Ref. [19].

6.8. Flowtool

All the algorithms implemented can be executed from auser-friendly interface (Fig. 14), Flowtool that we havedeveloped during the tests. All the options described inthis paper for the correlation can be selected with the


Table 11Speeding up the flow estimate by using a multi-scale minimisation does notintroduce a relevant error for the Yosemite Valley

Density Average error Standard deviation Time/time (5)

5 11.7 9.1 14 11.7 9.2 1.23 11.7 9.2 1.62 11.7 9.2 3.51 11.7 9.2 11.3

Table 12Precision of the non-integer flows computed with different correction algo-rithms on the Translating Tree and Diverging Tree sequences: the differ-ential is clearly the best one

Algorithm Average error Standard deviation

Integer 1.12 0.67Weighted sum 1.05 0.54Interpolation 1.15 1.24Differential 0.44 0.45Diff (reg.) 0.34 0.39


Integer 18.28 8.32Weighted sum 15.29 7.91Interpolation 10.39 5.36Differential 3.65 3.27Diff (reg.) 2.24 2.05

Fig. 9. Results obtained with different techniques for non-integer approx-imation on the Diverging Tree Sequence (only 1 arrow every 8 8 is shownfor clear visualization. Zero length arrows are not represented). (a) Integerestimate (SSD). (b) Weighted sum. (c) Interpolation. (d) Differential.

Fig. 10. Optical flows obtained on the Yosemite Valley Sequence (only onearrow every 8 8 is displayed for clear visualization). (a) ZNCC 1interpolation: (b) ZNCC 1 differential:

appropriate menu, and the user can also compute the flowwith differential algorithms, compute flow differences,display and print images and flows. It has been developedusing the X window and the Sun XView libraries and anexecutable code for Sun-Sparcstation is available on the netfrom the web page http://www.crs4.it/ , giach.

7. Discussion

The use of techniques based on pattern matchingperformed on image regions at different times is commonin practical applications, but a few algorithms of this kindare considered in the literature reviews [2]. In this paperseveral variations of this kind of algorithms and some tricksto reduce the major drawbacks of the method (i.e. thecomputational complexity and the integer values of the esti-mates) are presented. All the solutions proposed have beentested on the test images used by the optical flow researchercommunity and the results obtained are quite interesting.The accuracy of the flow estimated with the best correla-tion-based techniques, especially the one obtained with thenew mixed technique proposed, seem to be extremely

good even if compared with the outputs of the best differ-ential or energy based algorithms presented in Ref. [2]. Theresearch and the tests performed also provided other inter-esting information, showing clearly, for example that theusual distance measures are better than non-parametricones for complex images, that the gray-level normalisationof the distance is useful only when global variations ofbrightness are present, that the computational cost can bereduced without effects on the flow accuracy. Further,


Table 13Results obtained on the Yosemite Valley: notice that the flow density is100% and clouds are not removed


Integer (NCC) 13.54 13.35Weighted sum 14.15 13.40Interpolation 8.70 10.91Differential 5.73 9.43Diff (reg.) 4.86 10.22

Fig. 11. Optical flow computed on the car sequence superimposed to thecorresponding image. Only one arrow every 8 8 is shown for clear visua-lization. Zero length arrows are not represented.

Fig. 12. Optical flow computed on the Meteosat sequence superimposed tothe corresponding image. Only one arrow every 8 8 is shown for clearvisualization. Zero length arrows and low confidence estimates are notrepresented.

Fig. 13. Optical flow computed on the ultrasound heart sequence super-imposed to the corresponding image. Only one arrow every 8 8 is shownfor clear visualization. Zero length arrows and low confidence estimates arenot represented.

results obtained with our algorithms on real world sequenceare presented in order to demonstrate that when the textureis poor, the interframe motion is relevant or the timefrequency is high, the use of correlation-based algorithmsprovides results that are more accurate than those obtainedwith differential techniques and makes possible practicalapplications on cluttered images [911,18,19].

All the algorithms tested have been included in the Flow-tool X-Window-based toolkit available on the Internet(http:/www.crs4.it/ , giach) as Sun-Sparcstation executablecode.

Acknowledgements

Special thanks to Pietro Parodi for reading the manuscriptand useful hints and to Marco Cappello and Marco Campanifor technial help. Thanks also to Vincent Torre, AlessandroVerri for useful discussions.

References

[1] P. Anadan, computational framework and an algorithm for themeasurement of visual motion, Int. J. Comput. Vision 3 (1989)283310.

[2] J.L. Barron, D.J. Fleet, S.S. Beauchemin, Performance of optical flowtechniques Int, Int. J. Comput. Vision 12 (1) (1994) 4377.

[3] B.K.P. Horn, B.G. Schunck, Determining optical flow, Artificial Intel-ligence 17 (1981) 185203.

[4] B. Lucas, T. Kanade, An interative image registration technique withan application to stereo vision, Proc. DARPA Image Und. Workshop(1981) 121130.

[5] H. Nagel, Displacement vectors derived from 2nd order intensityvariations in image sequences, Comput. Vision Graph. Image Process21 (1983) 85117.

[6] R. Singh, An estimation-theoric framework for image flow computa-tion, Proc. Third ICCV, Osaka, 1990, pp. 168177.

[7] N. Ancona, T. Poggio, Optical Flow from 1D Correlation: applicationto a simple time-to-crash detector, Int. J. Comp. Vision 14 (1995) 2.

[8] W. Enkelmann, Obstacle detection by evaluation of optical flow fieldsfrom image sequences, IVC 9 (1991) 160168.

[9] A. Giachetti, M. Campani, V. Torre, The use of optical flow for theautonomous navigation, Proc. Eur. Conf Comp. Vision 3 (1994) 1.

[10] A. Giachetti, M. Campani, V. Torre, The use of optical flow forintelligent cruise control, Proc. Intelligent Vehicles, Parigi, 1994.

[11] A. Giachetti, G. Gigli, V. Torre, Computer assisted analysis of echo-cardiographic image sequences, Proc. CVRMed, Nice, 1995, pp.267271.

[12] P. Aschwanden, W. Guggenbuhl, Experimental results from acomparative study on correlation-type registration algorithms, in:Forster, Ruwiedel (Eds.), Robust Computer Vision, Wichmann,1992, pp. 268287.

[13] A. Giachetti, V. Torre, Refinement of optical flow estimation anddetection of motion edges, Proc. ECCV96, Cambridge, UK, April1996.


Fig. 14. The user-friendly tool realized for motion analysis.

[14] D.P. Huttenlocher, E.W. Jaquith, Computing visual correspondence:incorporating the probability of false match, Proc. ICCV95,Cambridge, MA, 1995, pp. 515522.

[15] R. Zabih, J. Woodfill, Non-parametric local transforms for computingvisual correspondence, Proc. ECCV94, Stockholm, vol. 2, 1994, pp.151158.

[16] T. Kanade, M. Okutomi, Stereo matching algorithm with an adaptivewindow: theory and experiment, IEEE Trans. PAMI 16 (9) (1994)920932.

[17] F. Bartolini, V. Cappellini, C. Colombo, A. Mecocci, Multiwindowleast-square approach to the estimation of optical flow with disconti-nuities, Opt. Engng 32 (6) (1993) 12501256.

[18] A. Giachetti, M. Campani, V. Torre, The use of optical flow for roadnavigation, IEEE Trans. Robotics Automat 14 (1998) 3448.

[19] A. Giachetti, On line analysis of echocardiographic imagessequences, Med. Image Anal. 2 (3) (1998) 261284.


evaluat

Documents