total variation regularization of local-global optical...

Abstract— More data fidelity terms in variational optical flow methods improve the estimation’s robustness. A robust and anisotropic smoother enhances the specific fill-in process. This work presents a combined local-global (CLG) approach with total variation regularization. The combination of bilateral filtering and anisotropic (image driven) regularization is used to control the propagation phenomena. The resulted method, CLG-TV, is able to compute larger displacements in a reasonable time. The numerical scheme is highly parallelizable and runs in real-time on current generation graphic processing units.

I. INTRODUCTION

OTION estimation plays an important role in computer vision applications. Two of the most known

and simple to analyze methods were developed by K.P. Horn, B.G. Schunck in [1] and B. D. Lucas, T. Kanade in [2]. The Horn-Schunck (HS) model proposes a variational approach to optical flow estimation:

2 22( ,· ( ))HSE r u v u v

(1).

HSE is the functional error to be minimized and ( , )r u v is

the residual between the images, defined below: 2

1 0 0 2 0 0 2 0

.

2 2

2

1 0

2( ) · ( )

( , )

( ) (

:

) ( )

( ) · 0

( )w

t

T

defT

w

I

I

w

I x I x u x u u u x u u u I x u

r u v u uI I

I

I

I

(2),

where ( , )x x y , ( , )u u v

and 0u is an initial estimate.

The terms of order higher than two are ignored. The above approximation is valid only for small values of the displacements u and v. The optimal solution of HSE is

calculated using the Euler-Lagrange equations and a Jacobi (not mandatory) iterative scheme.

Lucas-Kanade (LK) is a local method and assumes that small regions of pixels have the same flow. It is natural to consider a minimization procedure for the total error of that region: 2( , ) (· , )LK

region

E u v w r u v , which is a least-square

problem. Here, w represents the weighting factor. In order to

Manuscript received April 10, 2011. This work was partly supported by

the research project INTERSAFE-2 and partly by CNCSIS – UEFISCSU, project number PNII – IDEI 1522/2008. INTERSAFE-2 is part of the 7th Framework Programme, funded by the European Commission. The partners of INTERSAFE-2 thank the European Commission for supporting the work of this project.

M. Drulea is with the Computer Science Department, Technical University of Cluj-Napoca, Memorandumului Str. 28, 400114 Cluj-Napoca (e-mail: marius.drulea@ cs.utcluj.ro).

S. Nedevschi is with the Computer Science Department, Technical University of Cluj-Napoca, Memorandumului Str. 28, 400114 Cluj-Napoca (phone: +40-264-401219; e-mail: [email protected]).

find the minimizers of LKE , the derivatives are set to zero.

The resulting system is 2x2 with the following determinant:

2

2 2 2

2

2 2 2

· · ·

· · ·

x x yw w w

xw

yyw w

w w

w w

I I I

I I I

(3)

The determinant can be singular or nearly singular if the image derivatives for the corresponding region are close to zero. Where the determinant is non-singular, i.e. the pixel is a relatively strong feature, the method performs well.

A new variational approach results after combining LK and HS:

2 22· · ( , ) ( )regio

HSn

CLG w r u vE u v

(4).

The combination aims to improve the robustness of the HS model [3]. The main reason to do this is to enhance the flow accuracy for very large displacements. Such situations occur in traffic scenarios where the cars’ speeds can be high. On the other hand, the system camera(s) may move in the opposite direction and so the cars displacement in the images becomes even higher. The integration of LK into HS model increases the number of the data fidelity terms and the effect is an improved behavior for large homogenous areas with large flows.

The HS and CLG-HS models use the squared L2 norm and therefore they have some disadvantages like sensitivity to noise, over-propagation and the facts that they are not directionally selective and do not preserve motion boundaries. In order to significantly reduce these problems, a new variational model is proposed and a parallel numerical scheme is presented. The next section presents the new model and its minimization procedure.

II. THE PROPOSED MODEL

L1-norm is a better choice for modeling real problems involving discrete signals. Rudin, Osher and Fatemi originally suggested this and introduced the total variation in image processing domain [9]. L2-norm is easy to analyze mathematically, but is sensitive to outliers. L1-norm is robust in the presence of outliers, but is very difficult to analyze, especially in the case of functionals ([4],[9]). The Huber penalty function is also a good choice for a robust estimation. In [6] an alternative to the HS model is proposed:

1 ( , )· ( )TV LE r u v u v

(5). (TV-L1 = total

variation, L1). As an observation, the data term is in L1 and the smoothness term represents the isotropic total variation. Since the functions composing the functional are not differentiable, the minimization is not trivial. The experiments reveal clearly more accurate results than HS.

Total variation regularization of local-global optical flow

Marius Drulea, Sergiu Nedevschi

M

The above functional is decomposed into three parts, in order to match the minimization scheme proposed in [4]:

12 21 1

ˆ ˆ ˆ ˆ·| ( ) ( )2·

(2

, ) |·TVE r u v u u v v

(6),

21ˆ( )

2·TV u u u uE

(7),

21ˆ( )

2·TV v v v vE

(8).

While TV-1 is easily solved point-wise, the minimization of TV-u and TV-v is a difficult task. The last two functionals are under the Euler-Lagrange conditions, but the resulting system is highly nonlinear. Their solution can be computed using the numerical scheme proposed in [4]. The solution was originally developed to solve a denoising problem and it is subject of convex optimization ([4],[10]).

In order to increase the robustness of this model a local-global combination is proposed:

2( , )· · ( )CLG TVregion

E w r vv uu

(9).

As in CLG-HS, the idea is to use more data fidelity terms in order to increase the robustness. The data terms are squared and the smoothness terms are not. Therefore, the model uses a robust regularization. The main reason to keep the data fidelity terms squared is to keep the minimization scheme as simple as possible and to achieve real-time performance. The CLG-TV model developed so far is isotropic, it propagates the flow in all directions, regardless of local properties. To enhance the propagation, an anisotropic diffusion filter is considered. The effect of this filter is to reduce the propagation of the flow for image features (corners, edges) and to allow a higher “fill-in” effect for un-textured areas. Obviously, the diffusion coefficient, called diffusion tensor, should depend on image derivatives. Perona and Malik [11] pioneered the idea of anisotropic diffusion and proposed two functions for the diffusion coefficient:

2

1( )

1 ( / )I

KD

I

and 2( / )( ) I KID e ,

where the constant K controls the sensitivity to edges and is usually chosen experimentally. For a better control, the following diffusion tensor is used to alter the CLG-TV

model: ·( ) ID eI . The altered model now reads:

2· · ( )( , ) · ·CLG TVregion

E w r u v DD u v

(10).

The local window representing the data fidelity is another source of over-propagation. The pictures below illustrate this phenomenon. In Fig.1 a), the local window is centered into the moving region. In this case, the Lucas-Kanade method correctly computes the pixel’s flow. In Fig.1 b), the center of the window is placed into the static region, so the pixel has no motion. The top of the window is in the moving region. Since the method tries to match the moving part (beside the static one) of the window, the pixel appears to have a

motion, which is not true. Traditional methods use a Gaussian filter to weight the importance of the pixels in the local window, but this is not satisfactory for all cases, including the scenario shown in Fig. 1. The most elegant method to avoid such a behavior is to use a bilateral filter.

a)The window is centered in the moving region; the pixel’s flow is correctly determined

b)The window is centered in the static region; the pixel’s flow is incorrectly determined

Fig. 1. Two effects of the local window

The bilateral filter ([13], [14]) combines two types of

filters: a Gaussian (space) filter and a range (color, intensity) filter. The weight of a pixel decreases as the distance from the center increases. In addition, the weight decreases if the intensity of the pixel is different from the center’s intensity.

1·[ ·] ·

s r

q

normalizationfactor rangespace

the weight of

p p q qq re

I

gionp

BF I G G I Ip q IW

(11),

wheres

G denotes the spatial Gaussian filter and r

G

denotes the range (intensity) Gaussian filter. The product of these coefficients represents the weight for the pixel q of the region. The expression is then normalized in order to have a “fair” filter: the coefficients should sum to 1. Fig. 2 shows the effect of the bilateral filtering. Fig.2 c) shows the range filter for a pixel situated at the border of two regions with different intensities. Fig.2 d) shows the combined kernel. The pixels with the same intensity as the center weight more than the others. The output in Fig.2 e) keeps the edge of the initial image Fig.2 a).

The proposed CLG-TV model includes the bilateral weighting. Since the weights does not depend on the unknowns and for simplicity, the theoretical CLG-TV model is kept in the form (10). The CLG-TV functional error is decomposed into three parts:

12 2 21 1

ˆ ˆ ˆ ˆ· · ( ) ( )2

( , )· 2·CLG TV

region

E bfwr u v u u v v

(12),

21ˆ( ) ·

2·TV u u u D uE

(13),

21ˆ( ) ·

2·TV v v v D vE

(14),

where bfw denotes the bilateral filter weight.

For CLG-TV-1, u and v are considered fixed and ˆ ˆ,u v are

the unknowns. Since the functional does not depend on image derivatives, a point-wise minimization is employed.

Setting the derivatives to zero, the following linear system of equations results:

2

2 2 2 2 0

22 02 2 2

1 · · ·2 2

2 1

2 · ·ˆ·

ˆ 2 · ·· · ·2

x x y xw w w w

x yww w w

yy

bfw bfw bfw

bfwbfw bfw

I I I u I ru

v v I rI I I

(15), where 0 0 2 0 2: · ·x y

t w wI u vr I I . The main determinant of

(15) is always non-zero. For TV-u and TV-v, u and v are the unknowns. Their solutions are found adapting the algorithm presented in [4]. Since the problems are similar, the numerical schemes are expected to be similar. The following steps leads to the desired numerical scheme: The Euler-Lagrange equation for TV-u is :

1ˆ· ·( ) 0

uD u ud

uiv

(16).

Let up

be the dual variable defined as:

· 0, 1u u u

up p u u p

u

(17).

The relation (16) is rewritten in the form:

1ˆ ˆ· ·( ) 0 · ·u udiv D p u u u div D p u

(18).

Substituting u in (17) and dividing by results to:

ˆ ˆ· · / · · / 0u u up div D p u div D p u

(19),

which can be solved using a fixed-point iteration scheme:

1ˆ· · /

· /1 ˆ·

n nu u

nu

nu

p div D up

div

p

pD u

(20), where 1

4

is the step width. Setting 1D , the numerical procedure match the one developed in [4]. The resulted numerical scheme is highly parallelizable.

III. IMPLEMENTATION

Since the approximation (2) is valid only for small displacements, a coarse-to-fine approach is employed. Beside large optical flow estimation, the coarse-to-fine approach improves the propagation from textured areas into weakly textured ones. For each level of the pyramid, a warping technique is used. For each warp the optical flow is

computed using the equations (15), (18) and (20) in an

iterative manner. The matrices ,u v , ,u vp p

are prolongated

from each coarser level to the finer one. They are set to zero at the coarsest level. The next sequence depicts the pseudo-code of the method: 1) Set up pyramids 1 2,I I and the their derivatives at each level

2) Computes the diffusion tensor D for each level of the pyr. 1I .

3) For each level starting with the coarsest (below, each referred variable is at the current level)

a) If the coarsest level, then initialize ,u v , ,u vp p

to zero.

b) If not, upsample ,u v , ,u vp p

from the previous level to

the current level. c) for i = 1 to warps (Apply the warping technique)

i) Apply a median filter to ,u v in order to avoid strong

outliers. ii) Warp

1 22 2, , , yxI I I I using bilinear interpolation

iii) For k=1 to eq_iterations A. Computes ˆ ˆ,u v using (15). This step includes

the computations of the coefficients using the bilateral filter.

1I is used for the range

gaussian. B. Update ˆ· · uu div D p u

)(18)

C. Update ,u vp p

using (20)

Typical values for the parameters are warps = 5, eq_iterations = 10. The pyramid factor is 0.5 and the number of levels depends on the image size. For 512x383 images, the number of levels can be six. For the diffusion

filter ·( ) ID eI , 5, 1/ 2 are very good

values. The bilateral filter uses the first image 1I to compute

the range filter. In order to cope with large displacements, the implementation uses a 5x5 bilateral filter with 5 / 6s for the space gaussian and 0.1r (image intensities are in

[0,1] ) for the range gaussian. A 3x3 median filter is enough

for the step 3)c)i). The data fidelity factor can be 1000 and 0.5 . The derivative operator ( ) is computed using forward differences, while the divergence operator ( div ) uses backward differences. The derivatives of the images are computed using central differences. The five points mask [1, 8,0,8, 1] /12 gives very good results.

a) input image I b) spatial kernel s

G c) range kernel r

G d) combined kernel

·s r

G G

e) output [ ]BF I

Fig. 2. The bilateral filter (the images are taken from [13])

hmHffth

sim(sMaacHath

ow

The proposed

highly parallelmethod usingHowever, the Gfiltering for a filter is not sephe step goal: to

IV

The CLG-TVsynthetic and smages. The dcombined loca

scenarios ([3],Middlebury o. and outperformand for angulcompared with Huber-L1([7]), a solver, are hhe same runninFig. 4. shows

order to reach warps and equ

a) Army Fig. 3. Optical flow

Fig. 4. Average en

d numerical solizable. The ig CUDA tecGPU implemen

better performparable, but theo remove the st

V. RESULTS A

V method hasemi-synthetic distinction is al-global) meth,[15]). CLG-Tflow site (http

ms its CLG comlar error. TTV-L1 ([6]) abecause all of

highly paralleling time. s the average ehigh accuracy

uation iteration

w results of the CLG

nd‐point and angu

olver of the Cimplementationchnology is ntation uses a mance. Genere separate verstrong outliers f

AND COMPARISO

as two evaluimages and tnecessary be

hods are moreTV has been p://vision.middmpetitors both fThe proposed and its improvef them use [4] izable and hav

end-point and ay for Middlebuns should be u

b) MeG‐TV method, on a

lar errors of the se

CLG-TV modeln of the curr

straightforwaseparable med

rally, the medsion is enough from the flow.

ONS

uations: one the other for recause the CLe suitable for r

ranked on dlebury.edu/flowfor endpoint ermethod is a

ed version, Aniand [5] to der

ve approximat

angular errors.ury images, maused. The valu

quon a part of the semi‐

elected methods

l is rent ard. dian dian

for

for real LG real the w/) rror also iso. rive tely

. In any ues

warps=evaluatiinstead displacemedian iteration

The sequencmeasuresubmissand 2ndresult cmethodL1 and Howeve

Fig. 6high spocclusioAniso. H6 c) theanalysis6 d)2)) but the The bahand, C

c) ‐synthetic Middleb

=35 and eq_ion. The bilateof 5x5, since

ements. Thefiltering step

n (as a step D. Middlebury b

ces for evaluates the qualitysion, CLG-TVd for normaliconfirms a higd for real scena

CLG-TV haveer, these seque6 presents compeeds and theyons. In such siHuber-L1 failse flow might s of the horizoshows the conmethod show

ackside flow iCLG-TV show

Schefflerabury images

_iterations=5 eral filter usesthe images dpyramid factp is performeinside the last benchmark [1tion (Fig. 4). Ty of the estim

V was ranked 4ized interpolatgh quality estiarios. As seen e nearly the sa

ences do not hammon traffic sy cause large ituations, CLGs to compute thlook correct f

ontal flow for Antrary. The car

ws a motion of s correctly de

ws a motion of

d)

were used s a window si

do not present or was 0.8. ed after each“for” cycle).

12] provides The interpolatimation. At th4th for interpoltion error (Figimation of thein Fig. 5, Ani

ame interpolatiave large displascenarios. The displacements

G-TV is clearlyhe correct flow

for both methoAniso. Huber-Lr moves about

f 30 pixels for etermined. Onf about 50 pix

Grove

for this ize of 3x3 very large Also, the

h equation

four real ion quality

he time of lation error g. 5). This e CLG-TV iso. Huber-on quality.

acements. cars have

and large y superior; w. For Fig. ods, but an L1 (in Fig. t 50 pixels,

its center. n the other xels for the

csoowrwAdr

nTte

reabp

F

F

complete car ssimple: the locaover the untexones. Thus, bewindow contribregulatization).were used (enAniso. Huber-Ldifferent settingrunning time.

The CUDA numerical solvThe images areested on an In

equipped with a

The CLG (crobustness forencountered inaccuracy, the Cbased on totaproposed mode

a) BackyaFig. 4. Optical flow

Fig. 5. Top five me

surface. The eal window of t

xtured zones aeside represenbutes to the fl For images

nabling real-tiL1 can solve gs (like high a

GPU implever runs in 29 e 512x383 grayntel Core 2 Dua NVidia GeFo

V. CON

ombined localr scenes withn traffic scenaCLG-TV methal variation mel is anisotropi

rd

w results of the CLG

ethods for the inte

explanation ofthe CLG-TV mand links themnting the data fflow propagatioin Fig. 6, theme processingsome of theseaccuracy), but

ementation oms using the

yscale. The impo machine runorce GTX 285

NCLUSIONS

l-global) methh large displarios. In orderhod use a robuminimization. ic and uses a

b) BaskG‐TV method, on t

erpolation and nor

f this outcomemodel spans betm to the textufidelity, the loon (enhances e default setting on the GPUe problems us

these affects

f the propoe default settinplementation w

nning at 2.6 GHgraphics card.

hods increase lacements, ofr to increase ust regularizatIn addition, bilateral filter

ketballthe Middlebury re

rmalized interpola

e is tter

ured ocal the ngs U). ing the

sed ngs. was Hz,

the ften the

tion the

ring

techniqupropagaand imuntexturobust o

[1] B.KInte

[2] B. DwithInte679

[3] A. HorInte

[4] A. appl2):8

[5] A. convMat

[6] C. ZrealLNC

c) Dal images

tion errors

ue, combinatioation. It reduc

mproves the fiured ones. Oveoptical flow est

K.P. Horn and B.Gelligence, 17:185–2

D. Lucas and T. Kh an application ernational Joint Co, April 1981.

Bruhn, J. Weickrn/Schunck: Combernational Journal o

Chambolle, “An lications”, Journa

89–97, 2004.

Chambolle, T. Povex problems wthematical Imaging

Zach, T. Pock, antime TV-L1 opticaCS, pages 214–223

umptruck

on that significes the propagaill-in process erall, CLG-TVtimator.

REFERENCES

. Schunck, “Deter203, 1981.

anade, “An iterativto stereo vision,

onference on Arti

kert, and C. Scbining Local and Gof Computer Visio

algorithm for total of Mathematica

ock, “A first-orwith applicationsg and Vision (21 D

nd H. Bischof, “al flow,” In Patter3, 2007.

d)

cantly improveation on the bfrom textured

V is a fast, ac

S rmining optical flo

ve image registrat,” In Proceedingificial Intelligence

chnorr, “Lucas/KaGlobal Optic Flow on 61(3), 211–231

tal variation minial Imaging and V

rder primal-dual as to imaging”, December 2010), p

“A duality based rn Recognition, vo

Evergreen

es the flow background d areas to curate and

ow,” Artificial

ion technique s of the 7th

e, pages 674–

anade Meets Methods,” In , 2005.

mization and Vision, 20(1–

algorithm for Journal of

pp. 1-26.

approach for olume 4713 of

[7

[

[9

[

[

[

[

[

7] M. WerlbergBischof, “AnVision Confe

8] H.-H. Nagel constraints fimage sequeMachine Inte

9] L. I. Rudin, Snoise remova

10] R. T. RockMathematics.0-691-01586-

11] P.Perona, J. Mdiffusion,” Pr(1987), 16–22

12] S. Baker, S. R“A databaseInternational

13] Frédo Durandof high-dynathe 29th anntechniques, p

14] C. Tomasi animages”, In Computer ViComputer So

First ima)

b)

c)

d)

d)1) HoFig. 6. CLG‐TV

er, W. Trobin, T.

nisotropic Huber-Lerence (BMVC), 20

and W. Enkelmafor the estimationences”, IEEE Trlligence, 8:565–59

S. Osher, and E. Fal algorithms,” Phy

kafellar. Convex . Princeton Univer-4. Reprint of the 1

Malik, “Scale spacroc. IEEE Comput2.

Roth, D. Scharsteie and evaluation Conference Comp

d and Julie Dorseymic-range images

nual conference oages 257–266, Ne

nd R. Manduchi, Proceedings of tision, ICCV ’98, ciety.

age

rizontal flow for Cvs Aniso. Huber‐L

. Pock, A. Wedel,L1 optical flow,”009.

ann, “An investign of displacemenransactions on P93, 1986.

Fatemi, “Nonlinearysica D, 60:259–26

analysis. Princrsity Press, Prince1970 original, Prin

ce and edge detectter Soc. Workshop

in, M. Black, J. Lmethodology fo

puter Vision, 2007

y, “Fast bilateral fis”, In SIGGRAPHon Computer grapew York, NY, USA

“Bilateral filteringthe Sixth Internat

Washington, DC

Second im

CLG‐TV, c) 1 on traffic scenes

, D. Cremers, and” In British Mach

gation of smoothnnt vector fields fPattern Analysis

r total variation ba68, 1992.

ceton Landmarks eton, NJ, 1997. ISnceton Paperbacks

tion using anisotrop on Computer Vis

ewis, and R. Szelior optical flow,”7, pp. 1–8.

iltering for the dispH ’02: Proceedingphics and interac

A, 2002. ACM.

g for gray and ctional ConferenceC, USA, 1998. IE

mage

s with large displa

d H. hine

ness from and

ased

in SBN s.

opic sion

iski, ” in

play s of

ctive

olor e on EEE

[15] MarcomProcInte

CLG‐TV

d)2) Horizcements. The defa

rius Drulea, Ioan mbined local-globaceedings of the

elligent Computer

zontal flow for Aniault setting were u

Radu Peter, Sergial approach using

2010 IEEE 6th Communication an

Aniso. Hu

so. Huber‐L1, c) used (real‐time)

iu Nedevschi, "OpL1 norm," ICCPInternational Co

nd Processing, 20

uber‐L1

ptical flow. A , pp.217-222, onference on 10.

total variation regularization of local-global optical...

Documents