total variation regularization of local-global optical...
TRANSCRIPT
Abstract— More data fidelity terms in variational optical flow methods improve the estimation’s robustness. A robust and anisotropic smoother enhances the specific fill-in process. This work presents a combined local-global (CLG) approach with total variation regularization. The combination of bilateral filtering and anisotropic (image driven) regularization is used to control the propagation phenomena. The resulted method, CLG-TV, is able to compute larger displacements in a reasonable time. The numerical scheme is highly parallelizable and runs in real-time on current generation graphic processing units.
I. INTRODUCTION
OTION estimation plays an important role in computer vision applications. Two of the most known
and simple to analyze methods were developed by K.P. Horn, B.G. Schunck in [1] and B. D. Lucas, T. Kanade in [2]. The Horn-Schunck (HS) model proposes a variational approach to optical flow estimation:
2 22( ,· ( ))HSE r u v u v
(1).
HSE is the functional error to be minimized and ( , )r u v is
the residual between the images, defined below: 2
1 0 0 2 0 0 2 0
.
2 2
2
1 0
2( ) · ( )
( , )
( ) (
:
) ( )
( ) · 0
( )w
t
T
defT
w
I
I
w
I x I x u x u u u x u u u I x u
r u v u uI I
I
I
I
(2),
where ( , )x x y , ( , )u u v
and 0u is an initial estimate.
The terms of order higher than two are ignored. The above approximation is valid only for small values of the displacements u and v. The optimal solution of HSE is
calculated using the Euler-Lagrange equations and a Jacobi (not mandatory) iterative scheme.
Lucas-Kanade (LK) is a local method and assumes that small regions of pixels have the same flow. It is natural to consider a minimization procedure for the total error of that region: 2( , ) (· , )LK
region
E u v w r u v , which is a least-square
problem. Here, w represents the weighting factor. In order to
Manuscript received April 10, 2011. This work was partly supported by
the research project INTERSAFE-2 and partly by CNCSIS – UEFISCSU, project number PNII – IDEI 1522/2008. INTERSAFE-2 is part of the 7th Framework Programme, funded by the European Commission. The partners of INTERSAFE-2 thank the European Commission for supporting the work of this project.
M. Drulea is with the Computer Science Department, Technical University of Cluj-Napoca, Memorandumului Str. 28, 400114 Cluj-Napoca (e-mail: marius.drulea@ cs.utcluj.ro).
S. Nedevschi is with the Computer Science Department, Technical University of Cluj-Napoca, Memorandumului Str. 28, 400114 Cluj-Napoca (phone: +40-264-401219; e-mail: [email protected]).
find the minimizers of LKE , the derivatives are set to zero.
The resulting system is 2x2 with the following determinant:
2
2 2 2
2
2 2 2
· · ·
· · ·
x x yw w w
xw
yyw w
w w
w w
I I I
I I I
(3)
The determinant can be singular or nearly singular if the image derivatives for the corresponding region are close to zero. Where the determinant is non-singular, i.e. the pixel is a relatively strong feature, the method performs well.
A new variational approach results after combining LK and HS:
2 22· · ( , ) ( )regio
HSn
CLG w r u vE u v
(4).
The combination aims to improve the robustness of the HS model [3]. The main reason to do this is to enhance the flow accuracy for very large displacements. Such situations occur in traffic scenarios where the cars’ speeds can be high. On the other hand, the system camera(s) may move in the opposite direction and so the cars displacement in the images becomes even higher. The integration of LK into HS model increases the number of the data fidelity terms and the effect is an improved behavior for large homogenous areas with large flows.
The HS and CLG-HS models use the squared L2 norm and therefore they have some disadvantages like sensitivity to noise, over-propagation and the facts that they are not directionally selective and do not preserve motion boundaries. In order to significantly reduce these problems, a new variational model is proposed and a parallel numerical scheme is presented. The next section presents the new model and its minimization procedure.
II. THE PROPOSED MODEL
L1-norm is a better choice for modeling real problems involving discrete signals. Rudin, Osher and Fatemi originally suggested this and introduced the total variation in image processing domain [9]. L2-norm is easy to analyze mathematically, but is sensitive to outliers. L1-norm is robust in the presence of outliers, but is very difficult to analyze, especially in the case of functionals ([4],[9]). The Huber penalty function is also a good choice for a robust estimation. In [6] an alternative to the HS model is proposed:
1 ( , )· ( )TV LE r u v u v
(5). (TV-L1 = total
variation, L1). As an observation, the data term is in L1 and the smoothness term represents the isotropic total variation. Since the functions composing the functional are not differentiable, the minimization is not trivial. The experiments reveal clearly more accurate results than HS.
Total variation regularization of local-global optical flow
Marius Drulea, Sergiu Nedevschi
M
The above functional is decomposed into three parts, in order to match the minimization scheme proposed in [4]:
12 21 1
ˆ ˆ ˆ ˆ·| ( ) ( )2·
(2
, ) |·TVE r u v u u v v
(6),
21ˆ( )
2·TV u u u uE
(7),
21ˆ( )
2·TV v v v vE
(8).
While TV-1 is easily solved point-wise, the minimization of TV-u and TV-v is a difficult task. The last two functionals are under the Euler-Lagrange conditions, but the resulting system is highly nonlinear. Their solution can be computed using the numerical scheme proposed in [4]. The solution was originally developed to solve a denoising problem and it is subject of convex optimization ([4],[10]).
In order to increase the robustness of this model a local-global combination is proposed:
2( , )· · ( )CLG TVregion
E w r vv uu
(9).
As in CLG-HS, the idea is to use more data fidelity terms in order to increase the robustness. The data terms are squared and the smoothness terms are not. Therefore, the model uses a robust regularization. The main reason to keep the data fidelity terms squared is to keep the minimization scheme as simple as possible and to achieve real-time performance. The CLG-TV model developed so far is isotropic, it propagates the flow in all directions, regardless of local properties. To enhance the propagation, an anisotropic diffusion filter is considered. The effect of this filter is to reduce the propagation of the flow for image features (corners, edges) and to allow a higher “fill-in” effect for un-textured areas. Obviously, the diffusion coefficient, called diffusion tensor, should depend on image derivatives. Perona and Malik [11] pioneered the idea of anisotropic diffusion and proposed two functions for the diffusion coefficient:
2
1( )
1 ( / )I
KD
I
and 2( / )( ) I KID e ,
where the constant K controls the sensitivity to edges and is usually chosen experimentally. For a better control, the following diffusion tensor is used to alter the CLG-TV
model: ·( ) ID eI . The altered model now reads:
2· · ( )( , ) · ·CLG TVregion
E w r u v DD u v
(10).
The local window representing the data fidelity is another source of over-propagation. The pictures below illustrate this phenomenon. In Fig.1 a), the local window is centered into the moving region. In this case, the Lucas-Kanade method correctly computes the pixel’s flow. In Fig.1 b), the center of the window is placed into the static region, so the pixel has no motion. The top of the window is in the moving region. Since the method tries to match the moving part (beside the static one) of the window, the pixel appears to have a
motion, which is not true. Traditional methods use a Gaussian filter to weight the importance of the pixels in the local window, but this is not satisfactory for all cases, including the scenario shown in Fig. 1. The most elegant method to avoid such a behavior is to use a bilateral filter.
a)The window is centered in the moving region; the pixel’s flow is correctly determined
b)The window is centered in the static region; the pixel’s flow is incorrectly determined
Fig. 1. Two effects of the local window
The bilateral filter ([13], [14]) combines two types of
filters: a Gaussian (space) filter and a range (color, intensity) filter. The weight of a pixel decreases as the distance from the center increases. In addition, the weight decreases if the intensity of the pixel is different from the center’s intensity.
1·[ ·] ·
s r
q
normalizationfactor rangespace
the weight of
p p q qq re
I
gionp
BF I G G I Ip q IW
(11),
wheres
G denotes the spatial Gaussian filter and r
G
denotes the range (intensity) Gaussian filter. The product of these coefficients represents the weight for the pixel q of the region. The expression is then normalized in order to have a “fair” filter: the coefficients should sum to 1. Fig. 2 shows the effect of the bilateral filtering. Fig.2 c) shows the range filter for a pixel situated at the border of two regions with different intensities. Fig.2 d) shows the combined kernel. The pixels with the same intensity as the center weight more than the others. The output in Fig.2 e) keeps the edge of the initial image Fig.2 a).
The proposed CLG-TV model includes the bilateral weighting. Since the weights does not depend on the unknowns and for simplicity, the theoretical CLG-TV model is kept in the form (10). The CLG-TV functional error is decomposed into three parts:
12 2 21 1
ˆ ˆ ˆ ˆ· · ( ) ( )2
( , )· 2·CLG TV
region
E bfwr u v u u v v
(12),
21ˆ( ) ·
2·TV u u u D uE
(13),
21ˆ( ) ·
2·TV v v v D vE
(14),
where bfw denotes the bilateral filter weight.
For CLG-TV-1, u and v are considered fixed and ˆ ˆ,u v are
the unknowns. Since the functional does not depend on image derivatives, a point-wise minimization is employed.
Setting the derivatives to zero, the following linear system of equations results:
2
2 2 2 2 0
22 02 2 2
1 · · ·2 2
2 1
2 · ·ˆ·
ˆ 2 · ·· · ·2
x x y xw w w w
x yww w w
yy
bfw bfw bfw
bfwbfw bfw
I I I u I ru
v v I rI I I
(15), where 0 0 2 0 2: · ·x y
t w wI u vr I I . The main determinant of
(15) is always non-zero. For TV-u and TV-v, u and v are the unknowns. Their solutions are found adapting the algorithm presented in [4]. Since the problems are similar, the numerical schemes are expected to be similar. The following steps leads to the desired numerical scheme: The Euler-Lagrange equation for TV-u is :
1ˆ· ·( ) 0
uD u ud
uiv
(16).
Let up
be the dual variable defined as:
· 0, 1u u u
up p u u p
u
(17).
The relation (16) is rewritten in the form:
1ˆ ˆ· ·( ) 0 · ·u udiv D p u u u div D p u
(18).
Substituting u in (17) and dividing by results to:
ˆ ˆ· · / · · / 0u u up div D p u div D p u
(19),
which can be solved using a fixed-point iteration scheme:
1ˆ· · /
· /1 ˆ·
n nu u
nu
nu
p div D up
div
p
pD u
(20), where 1
4
is the step width. Setting 1D , the numerical procedure match the one developed in [4]. The resulted numerical scheme is highly parallelizable.
III. IMPLEMENTATION
Since the approximation (2) is valid only for small displacements, a coarse-to-fine approach is employed. Beside large optical flow estimation, the coarse-to-fine approach improves the propagation from textured areas into weakly textured ones. For each level of the pyramid, a warping technique is used. For each warp the optical flow is
computed using the equations (15), (18) and (20) in an
iterative manner. The matrices ,u v , ,u vp p
are prolongated
from each coarser level to the finer one. They are set to zero at the coarsest level. The next sequence depicts the pseudo-code of the method: 1) Set up pyramids 1 2,I I and the their derivatives at each level
2) Computes the diffusion tensor D for each level of the pyr. 1I .
3) For each level starting with the coarsest (below, each referred variable is at the current level)
a) If the coarsest level, then initialize ,u v , ,u vp p
to zero.
b) If not, upsample ,u v , ,u vp p
from the previous level to
the current level. c) for i = 1 to warps (Apply the warping technique)
i) Apply a median filter to ,u v in order to avoid strong
outliers. ii) Warp
1 22 2, , , yxI I I I using bilinear interpolation
iii) For k=1 to eq_iterations A. Computes ˆ ˆ,u v using (15). This step includes
the computations of the coefficients using the bilateral filter.
1I is used for the range
gaussian. B. Update ˆ· · uu div D p u
)(18)
C. Update ,u vp p
using (20)
Typical values for the parameters are warps = 5, eq_iterations = 10. The pyramid factor is 0.5 and the number of levels depends on the image size. For 512x383 images, the number of levels can be six. For the diffusion
filter ·( ) ID eI , 5, 1/ 2 are very good
values. The bilateral filter uses the first image 1I to compute
the range filter. In order to cope with large displacements, the implementation uses a 5x5 bilateral filter with 5 / 6s for the space gaussian and 0.1r (image intensities are in
[0,1] ) for the range gaussian. A 3x3 median filter is enough
for the step 3)c)i). The data fidelity factor can be 1000 and 0.5 . The derivative operator ( ) is computed using forward differences, while the divergence operator ( div ) uses backward differences. The derivatives of the images are computed using central differences. The five points mask [1, 8,0,8, 1] /12 gives very good results.
a) input image I b) spatial kernel s
G c) range kernel r
G d) combined kernel
·s r
G G
e) output [ ]BF I
Fig. 2. The bilateral filter (the images are taken from [13])
hmHffth
sim(sMaacHath
ow
The proposed
highly parallelmethod usingHowever, the Gfiltering for a filter is not sephe step goal: to
IV
The CLG-TVsynthetic and smages. The dcombined loca
scenarios ([3],Middlebury o. and outperformand for angulcompared with Huber-L1([7]), a solver, are hhe same runninFig. 4. shows
order to reach warps and equ
a) Army Fig. 3. Optical flow
Fig. 4. Average en
d numerical solizable. The ig CUDA tecGPU implemen
better performparable, but theo remove the st
V. RESULTS A
V method hasemi-synthetic distinction is al-global) meth,[15]). CLG-Tflow site (http
ms its CLG comlar error. TTV-L1 ([6]) abecause all of
highly paralleling time. s the average ehigh accuracy
uation iteration
w results of the CLG
nd‐point and angu
olver of the Cimplementationchnology is ntation uses a mance. Genere separate verstrong outliers f
AND COMPARISO
as two evaluimages and tnecessary be
hods are moreTV has been p://vision.middmpetitors both fThe proposed and its improvef them use [4] izable and hav
end-point and ay for Middlebuns should be u
b) MeG‐TV method, on a
lar errors of the se
CLG-TV modeln of the curr
straightforwaseparable med
rally, the medsion is enough from the flow.
ONS
uations: one the other for recause the CLe suitable for r
ranked on dlebury.edu/flowfor endpoint ermethod is a
ed version, Aniand [5] to der
ve approximat
angular errors.ury images, maused. The valu
quon a part of the semi‐
elected methods
l is rent ard. dian dian
for
for real LG real the w/) rror also iso. rive tely
. In any ues
warps=evaluatiinstead displacemedian iteration
The sequencmeasuresubmissand 2ndresult cmethodL1 and Howeve
Fig. 6high spocclusioAniso. H6 c) theanalysis6 d)2)) but the The bahand, C
c) ‐synthetic Middleb
=35 and eq_ion. The bilateof 5x5, since
ements. Thefiltering step
n (as a step D. Middlebury b
ces for evaluates the qualitysion, CLG-TVd for normaliconfirms a higd for real scena
CLG-TV haveer, these seque6 presents compeeds and theyons. In such siHuber-L1 failse flow might s of the horizoshows the conmethod show
ackside flow iCLG-TV show
Schefflerabury images
_iterations=5 eral filter usesthe images dpyramid factp is performeinside the last benchmark [1tion (Fig. 4). Ty of the estim
V was ranked 4ized interpolatgh quality estiarios. As seen e nearly the sa
ences do not hammon traffic sy cause large ituations, CLGs to compute thlook correct f
ontal flow for Antrary. The car
ws a motion of s correctly de
ws a motion of
d)
were used s a window si
do not present or was 0.8. ed after each“for” cycle).
12] provides The interpolatimation. At th4th for interpoltion error (Figimation of thein Fig. 5, Ani
ame interpolatiave large displascenarios. The displacements
G-TV is clearlyhe correct flow
for both methoAniso. Huber-Lr moves about
f 30 pixels for etermined. Onf about 50 pix
Grove
for this ize of 3x3 very large Also, the
h equation
four real ion quality
he time of lation error g. 5). This e CLG-TV iso. Huber-on quality.
acements. cars have
and large y superior; w. For Fig. ods, but an L1 (in Fig. t 50 pixels,
its center. n the other xels for the
csoowrwAdr
nTte
reabp
F
F
complete car ssimple: the locaover the untexones. Thus, bewindow contribregulatization).were used (enAniso. Huber-Ldifferent settingrunning time.
The CUDA numerical solvThe images areested on an In
equipped with a
The CLG (crobustness forencountered inaccuracy, the Cbased on totaproposed mode
a) BackyaFig. 4. Optical flow
Fig. 5. Top five me
surface. The eal window of t
xtured zones aeside represenbutes to the fl For images
nabling real-tiL1 can solve gs (like high a
GPU implever runs in 29 e 512x383 grayntel Core 2 Dua NVidia GeFo
V. CON
ombined localr scenes withn traffic scenaCLG-TV methal variation mel is anisotropi
rd
w results of the CLG
ethods for the inte
explanation ofthe CLG-TV mand links themnting the data fflow propagatioin Fig. 6, theme processingsome of theseaccuracy), but
ementation oms using the
yscale. The impo machine runorce GTX 285
NCLUSIONS
l-global) methh large displarios. In orderhod use a robuminimization. ic and uses a
b) BaskG‐TV method, on t
erpolation and nor
f this outcomemodel spans betm to the textufidelity, the loon (enhances e default setting on the GPUe problems us
these affects
f the propoe default settinplementation w
nning at 2.6 GHgraphics card.
hods increase lacements, ofr to increase ust regularizatIn addition, bilateral filter
ketballthe Middlebury re
rmalized interpola
e is tter
ured ocal the ngs U). ing the
sed ngs. was Hz,
the ften the
tion the
ring
techniqupropagaand imuntexturobust o
[1] B.KInte
[2] B. DwithInte679
[3] A. HorInte
[4] A. appl2):8
[5] A. convMat
[6] C. ZrealLNC
c) Dal images
tion errors
ue, combinatioation. It reduc
mproves the fiured ones. Oveoptical flow est
K.P. Horn and B.Gelligence, 17:185–2
D. Lucas and T. Kh an application ernational Joint Co, April 1981.
Bruhn, J. Weickrn/Schunck: Combernational Journal o
Chambolle, “An lications”, Journa
89–97, 2004.
Chambolle, T. Povex problems wthematical Imaging
Zach, T. Pock, antime TV-L1 opticaCS, pages 214–223
umptruck
on that significes the propagaill-in process erall, CLG-TVtimator.
REFERENCES
. Schunck, “Deter203, 1981.
anade, “An iterativto stereo vision,
onference on Arti
kert, and C. Scbining Local and Gof Computer Visio
algorithm for total of Mathematica
ock, “A first-orwith applicationsg and Vision (21 D
nd H. Bischof, “al flow,” In Patter3, 2007.
d)
cantly improveation on the bfrom textured
V is a fast, ac
S rmining optical flo
ve image registrat,” In Proceedingificial Intelligence
chnorr, “Lucas/KaGlobal Optic Flow on 61(3), 211–231
tal variation minial Imaging and V
rder primal-dual as to imaging”, December 2010), p
“A duality based rn Recognition, vo
Evergreen
es the flow background d areas to curate and
ow,” Artificial
ion technique s of the 7th
e, pages 674–
anade Meets Methods,” In , 2005.
mization and Vision, 20(1–
algorithm for Journal of
pp. 1-26.
approach for olume 4713 of
[7
[
[9
[
[
[
[
[
7] M. WerlbergBischof, “AnVision Confe
8] H.-H. Nagel constraints fimage sequeMachine Inte
9] L. I. Rudin, Snoise remova
10] R. T. RockMathematics.0-691-01586-
11] P.Perona, J. Mdiffusion,” Pr(1987), 16–22
12] S. Baker, S. R“A databaseInternational
13] Frédo Durandof high-dynathe 29th anntechniques, p
14] C. Tomasi animages”, In Computer ViComputer So
First ima)
b)
c)
d)
d)1) HoFig. 6. CLG‐TV
er, W. Trobin, T.
nisotropic Huber-Lerence (BMVC), 20
and W. Enkelmafor the estimationences”, IEEE Trlligence, 8:565–59
S. Osher, and E. Fal algorithms,” Phy
kafellar. Convex . Princeton Univer-4. Reprint of the 1
Malik, “Scale spacroc. IEEE Comput2.
Roth, D. Scharsteie and evaluation Conference Comp
d and Julie Dorseymic-range images
nual conference oages 257–266, Ne
nd R. Manduchi, Proceedings of tision, ICCV ’98, ciety.
age
rizontal flow for Cvs Aniso. Huber‐L
. Pock, A. Wedel,L1 optical flow,”009.
ann, “An investign of displacemenransactions on P93, 1986.
Fatemi, “Nonlinearysica D, 60:259–26
analysis. Princrsity Press, Prince1970 original, Prin
ce and edge detectter Soc. Workshop
in, M. Black, J. Lmethodology fo
puter Vision, 2007
y, “Fast bilateral fis”, In SIGGRAPHon Computer grapew York, NY, USA
“Bilateral filteringthe Sixth Internat
Washington, DC
Second im
CLG‐TV, c) 1 on traffic scenes
, D. Cremers, and” In British Mach
gation of smoothnnt vector fields fPattern Analysis
r total variation ba68, 1992.
ceton Landmarks eton, NJ, 1997. ISnceton Paperbacks
tion using anisotrop on Computer Vis
ewis, and R. Szelior optical flow,”7, pp. 1–8.
iltering for the dispH ’02: Proceedingphics and interac
A, 2002. ACM.
g for gray and ctional ConferenceC, USA, 1998. IE
mage
s with large displa
d H. hine
ness from and
ased
in SBN s.
opic sion
iski, ” in
play s of
ctive
olor e on EEE
[15] MarcomProcInte
CLG‐TV
d)2) Horizcements. The defa
rius Drulea, Ioan mbined local-globaceedings of the
elligent Computer
zontal flow for Aniault setting were u
Radu Peter, Sergial approach using
2010 IEEE 6th Communication an
Aniso. Hu
so. Huber‐L1, c) used (real‐time)
iu Nedevschi, "OpL1 norm," ICCPInternational Co
nd Processing, 20
uber‐L1
ptical flow. A , pp.217-222, onference on 10.