optimal projection of 2-d displacements for 3-d ...tziritas/papers/motionivc.pdf · optimal...

12
UNCORRECTED PROOF Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia, Georgios Tziritas * Department of Computer Science, University of Crete, P.O. Box 2208, Heraklion, Greece Abstract Recovering 3-D motion parameters from 2-D displacements is a difficult task, given the influence of noise contained in these data, which correspond at best to a crude approximation of the real motion field. Stability for the system of equations to solve is therefore essential. In this paper, we present a novel method based on an unbiased estimator that aims at enhancing this stability and strongly reduces the influence of noise contamination. Experimental results using synthetic and real optical flows are presented to demonstrate the effectiveness of our method in comparison to a set of selected methods. q 2002 Published by Elsevier Science B.V. Keywords: Motion estimation; Optical flow; Unbiased estimator 1. Introduction The estimation of 3-D motion parameters from a sequence of images is a fundamental task in computer vision research with numerous applications, such as egomotion and time-to-contact estimation for mobile robots [12], video segmentation [4], depth layering [11], or more generally 3-D scene reconstruction. Most methods for 3-D motion analysis begin by extracting two-dimensional motion information. Many algorithms have been proposed for extracting 3-D motion parameters from optical flow. A detailed review is proposed by Heeger and Jepson in Ref. [9]. The pioneering work of Prazdny [15] assumes that surfaces in the viewed scene are smooth and solves for rotation, at high computational cost, using a set of nonlinear equations that are independent of translation. Bruss and Horn [5] propose a global approach that combines information in the entire visual field to choose the 3-D motion and structure that fits the flow field best in the least squares sense. Adiv [1] minimizes the same residual function as Horn and Bruss but locally in patches under the assumption of planarity. Heeger and Jepson [9] minimize also the same residual function but depth and rotation parameters are eliminated in order to obtain a measure of error as a function of translation which is then analyzed to select the correct translation. Lobo and Tsotsos [12] propose a voting scheme based on triplets of points using the Collinear Point Constraint for cancelling rotation and finding the focus of expansion. Daniilidis [6] makes use of fixation on a scene point and projection of the spherical motion field on two latitudinal directions to decouple the motion parameter space, searching then along meridians of the image sphere. One main problem in correctly estimating the camera motion parameters is the fact that the 2-D motion field usually contains a set of noisy and partially incorrect data (outliers), making most of the above mentioned methods unstable. The set of incorrect data can be even larger if independent motions exist throughout the image sequence. The negative effects of this set of outliers on motion estimation increase with the complexity of the motion model which is used to describe the camera motion. Komodakis and Tziritas [11] proposed a robust estimation method to cope with the set of outliers and the use of a hierarchy of motion models, where simplest models were first tested, and then more complex models were considered. In this paper, we focus on improving the motion parameter estimation in the case of translational motion. Section 2 describes the equations linking the projected 2-D motions and 3-D motions inside the image sequence, which yields an overdetermined system of linear equations in the translation case. In Section 3, we propose a survey of the methods devised to solve these overdetermined systems, and select some of them according to criteria of processing times, for comparison to the method proposed in this paper. Section 4 presents our approach, based on the projection of the equations’ coefficients into a different space, chosen appropriately in order to reduce the influence of noise 0262-8856/02/$ - see front matter q 2002 Published by Elsevier Science B.V. PII: S0262-8856(02)00088-4 Image and Vision Computing xx (0000) xxx–xxx www.elsevier.com/locate/imavis * Corresponding author. Tel.: þ 30-810-39-3136; fax: þ30-810-39-3501. E-mail addresses: [email protected] (G. Tziritas), [email protected]. gr (C. Garcia). IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5 ARTICLE IN PRESS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

Upload: others

Post on 13-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

Optimal projection of 2-D displacements for 3-D translational

motion estimation

Christophe Garcia, Georgios Tziritas*

Department of Computer Science, University of Crete, P.O. Box 2208, Heraklion, Greece

Abstract

Recovering 3-D motion parameters from 2-D displacements is a difficult task, given the influence of noise contained in these data, which

correspond at best to a crude approximation of the real motion field. Stability for the system of equations to solve is therefore essential. In this

paper, we present a novel method based on an unbiased estimator that aims at enhancing this stability and strongly reduces the influence of

noise contamination. Experimental results using synthetic and real optical flows are presented to demonstrate the effectiveness of our method

in comparison to a set of selected methods. q 2002 Published by Elsevier Science B.V.

Keywords: Motion estimation; Optical flow; Unbiased estimator

1. Introduction

The estimation of 3-D motion parameters from a

sequence of images is a fundamental task in computer

vision research with numerous applications, such as

egomotion and time-to-contact estimation for mobile robots

[12], video segmentation [4], depth layering [11], or more

generally 3-D scene reconstruction. Most methods for 3-D

motion analysis begin by extracting two-dimensional

motion information. Many algorithms have been proposed

for extracting 3-D motion parameters from optical flow. A

detailed review is proposed by Heeger and Jepson in Ref.

[9]. The pioneering work of Prazdny [15] assumes that

surfaces in the viewed scene are smooth and solves for

rotation, at high computational cost, using a set of nonlinear

equations that are independent of translation. Bruss and

Horn [5] propose a global approach that combines

information in the entire visual field to choose the 3-D

motion and structure that fits the flow field best in the least

squares sense. Adiv [1] minimizes the same residual

function as Horn and Bruss but locally in patches under

the assumption of planarity. Heeger and Jepson [9]

minimize also the same residual function but depth and

rotation parameters are eliminated in order to obtain a

measure of error as a function of translation which is then

analyzed to select the correct translation. Lobo and Tsotsos

[12] propose a voting scheme based on triplets of points

using the Collinear Point Constraint for cancelling rotation

and finding the focus of expansion. Daniilidis [6] makes use

of fixation on a scene point and projection of the spherical

motion field on two latitudinal directions to decouple the

motion parameter space, searching then along meridians of

the image sphere.

One main problem in correctly estimating the camera

motion parameters is the fact that the 2-D motion field

usually contains a set of noisy and partially incorrect data

(outliers), making most of the above mentioned methods

unstable. The set of incorrect data can be even larger if

independent motions exist throughout the image sequence.

The negative effects of this set of outliers on motion

estimation increase with the complexity of the motion

model which is used to describe the camera motion.

Komodakis and Tziritas [11] proposed a robust estimation

method to cope with the set of outliers and the use of a

hierarchy of motion models, where simplest models were

first tested, and then more complex models were considered.

In this paper, we focus on improving the motion parameter

estimation in the case of translational motion. Section 2

describes the equations linking the projected 2-D motions

and 3-D motions inside the image sequence, which yields an

overdetermined system of linear equations in the translation

case. In Section 3, we propose a survey of the methods

devised to solve these overdetermined systems, and select

some of them according to criteria of processing times, for

comparison to the method proposed in this paper. Section 4

presents our approach, based on the projection of the

equations’ coefficients into a different space, chosen

appropriately in order to reduce the influence of noise

0262-8856/02/$ - see front matter q 2002 Published by Elsevier Science B.V.

PII: S0 26 2 -8 85 6 (0 2) 00 0 88 -4

Image and Vision Computing xx (0000) xxx–xxx

www.elsevier.com/locate/imavis

* Corresponding author. Tel.: þ30-810-39-3136; fax: þ30-810-39-3501.

E-mail addresses: [email protected] (G. Tziritas), [email protected].

gr (C. Garcia).

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

ARTICLE IN PRESS

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

Page 2: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

contamination. To do so, a noise model of optical flow is

proposed according to previous works. In Section 5,

experimental results using synthetic noisy optical flows

are analyzed in order to compare the selected methods and

to show the superiority of our approach. Experimental

results using real optical flow are also presented. Finally,

conclusions are drawn.

2. 3-D motion parameters from 2-D displacements

2.1. Optical flow

Consider a 3-D coordinate system OðX; Y ; ZÞ at the

optical center of a pinhole camera of focal length f, such that

the axis OZ coincides with the optical axis, as shown in Fig.

1.

Suppose that the camera is moving rigidly with respect to

its 3-D static environment with simultaneous 3-D transla-

tional motion ðTx; Ty; TzÞ and 3-D rotational motion

ðVx;Vy;VzÞ: For a point PðX; Y ; ZÞ; the velocity com-

ponents are given by:

X0 ¼dX

dt¼ 2Tx 2 ZVy þ YVz ð1Þ

Y 0 ¼dY

dt¼ 2Ty þ ZVx 2 XVz ð2Þ

Z 0 ¼dZ

dt¼ 2Tz 2 YVx þ XVy ð3Þ

Under perspective projection, a point PðX; Y ; ZÞ is projected

at pðx; yÞ onto the camera retina with:

x ¼Xf

Zand y ¼

Yf

Zð4Þ

Therefore, the 2-D retinal velocity field or optical flow ðu; vÞ

is:

u ¼dx

dt¼

X0f

Z2

XfZ 0

Z2¼

Y 0f

Z2 x

Z

Z 0ð5Þ

v ¼dy

dt¼

Y 0f

Z2

YfZ 0

Z2¼

Y 0f

Z2 y

Z 0

Zð6Þ

It yields:

u ¼2Txf þ xTz

ZþVx

xy

f2Vy

x2

fþ f

( )þVzx ð7Þ

v ¼2Tyf þ yTz

ZþVx

y2

fþ f

( )2Vy

xy

f2Vzx ð8Þ

Eqs. (7) and (8) describe a 2-D velocity field, which relates

the 3-D motion of points to their projected 2-D motion on

the image plane. By observing these two equations, one may

notice that (i) the effect of translational and rotational

components are separable, (ii) the vectors defined by the

translational components lies on lines going through the

point ðTxf=Tz; Tyf=TzÞ; which is called focus of expansion

(FOE), (iii) the rotational component of motion is

independent of scene structure, since the depth Z influences

the translational component only.

By eliminating Z from the motion field Eqs. (7) and (8)

and after some algebra, we obtain:

ðTyVy þ TzVzÞx2 þ ðTxVx þ TzVzÞy

2 2 ðTxVz þ TzVxÞxf

2 ðTyVz þ TzVyÞyf 2 ðTxVy þ TyVxÞxy þ ðTxVx

þ TyVyÞf2 þ Tyuf 2 Txvf þ Tzðxv 2 yuÞ

¼ 0 ð9Þ

Eq. (9) is difficult to solve in the general case, given the

products of terms from ðTx; Ty; TzÞ by terms from

ðVx;Vy;VzÞ: Some authors, for instance Gupta et al. [8],

try to solve the problem by differentiating this equation with

respect to x and y and solve for subsets of the basic motion

parameters using Least Squares methods. Flow derivatives

are involved which make the method even more sensitive to

the original noise contained in the optical flow. If the camera

motion is considered to be only translational, i.e.

ðVx;Vy;VzÞ ¼ ð0; 0; 0Þ; Eq. (9) may be rewritten as:

2Txvf þ Tyuf þ Tzðxv 2 yuÞ ¼ 0 ð10Þ

By writing Eq. (10) in matrix form, and considering n points

ðn q 3Þ where the optical flow is defined, we obtain a

homogeneous system of n linear equations in 3 variables.

Obviously, it is not possible to estimate the 3-D translation

vector, but only the ratios of the 3-D translation com-

ponents. Ratios over Tz are considered, if Tz is nonzero. The

particular case of Tz ¼ 0 will be considered later on.

In the case of Tz – 0; introducing the notation ða ¼

Txf=Tz; b ¼ Tyf=TzÞ; Eq. (10) becomes:

2v1 u1 x1v12y1u1...

2vi ui xivi2yiui...

2vn un xnvn2ynun

24

35

|fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl}M1

ab1

� |{z}

B1

¼ 0 ð11Þ

The point ða;bÞ is called FOE and corresponds to the point

of intersection of the lines supporting the motion vectors

defined by the translational components. This may be

observed in the optical flow shown in the first line of Fig. 3.

This case of 3-D translation will be referred as full

translation.

In the case of Tz ¼ 0; the FOE is at infinity. Only the

direction of translation may be recovered. This direction is

defined by the ratio g ¼ Ty=Tx (or Tx=Ty). This case of 3-D

motion will be referred as fronto-parallel translation. In that

case, Eq. (10) gives rise to the following system of

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx2

ARTICLE IN PRESS

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

Page 3: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

equations:

u1 2v1...

ui 2vi...

un 2vn

24

35

|fflfflffl{zfflfflffl}M1

g1

h i|{z}B1 ¼ 0 ð12Þ

In both cases, the estimation of solution B1 consists in

solving the corresponding overdetermined system, i.e. Eq.

(11) or (12), where all coefficients of M1 are noisy, given

that they depend on u and v. Indeed, the observed optical

flow field is a very crude approximation of the motion field,

whatever method for computing it is used. An interesting

review of optical flow techniques including performance

analysis is presented in Ref. [3].

2.2. Point correspondences

Considering the discrete case where point correspon-

dences have been obtained, let ðx0; y0Þ be, at time t0; the 2-D

point corresponding to ðx; yÞ at time t.

Given that, in 3-D space,

X0

Y 0

Z 0

2664

3775 ¼

X

Y

Z

2664

37752

TX

TY

TZ

2664

3775; ð13Þ

we obtain the relations of image point coordinates

x0 ¼xZ 2 fTX

Z 2 TZ

; y0 ¼yZ 2 fTY

Z 2 TZ

: ð14Þ

By eliminating Z from the above two correspondence

equations, if TZ – 0; we obtain:

x0 2 x

y0 2 y¼

x0 2 a

y0 2 b; ð15Þ

where ða;bÞ can again be interpreted as the FOE of the 2-D

displacement vector field. By symmetry we can also write,

x 2 x0

y 2 y0¼

a2 x

b2 y: ð16Þ

Finally, we obtain one linear equation for each point

correspondence, which is quite similar to Eq. (10) obtained

with the optical flow vector:

2ðy0 2 yÞaþ ðx0 2 xÞb ¼ x0y 2 xy0 ¼ ðx0 2 xÞy 2 ðy0 2 yÞx:

ð17Þ

If we denote u ¼ x0 2 x and v ¼ y0 2 y; Eqs. (10) and (17)

are identical.

When the 3-D translation is parallel to the image plane,

we obtain

ðx0 2 xÞg ¼ y0 2 y; ð18Þ

where g is again the ratio of the two translation components.

3. Existing methods for solving overdetermined systems

Several main techniques have been proposed for solving

overdetermined linear systems. In the following paragraphs,

we will give an overview of these methods and select some

of them according to processing-time criteria as a basis of

comparison with the proposed approach.

3.1. Least squares

The most popular methods are the error minimizing

techniques which formulate a quadratic error function to be

minimized. The simplest and therefore most often used error

minimizing technique is Least Squares (LS). The goal is to

find B1 which minimizes the norm kM1B1k2; and the problem

is reduced to solve the linear system M2B2 ¼ A for B2 such

that kM2B2 2 Ak2 is minimized, with M1 ¼ ½M2l½2A�� and

Bt1 ¼ ½B21�t: The classical least square solution is given by

B2 ¼ ðMT2 M2Þ

21MT2 A: It may be noticed that if the noise is

Gaussian and affects only A, this solution is also the

maximum likelihood estimate of B2:

3.2. Total least squares

Although it offers a simple technique for solving the

Fig. 1. The camera coordinate system.

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx 3

ARTICLE IN PRESS

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

Page 4: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

problem, LS provides an unbiased estimate only if M2 is

noise free and all the errors are in A. In our case, it can be

easily observed that measurement errors affect both M2 and

A. In the case of equally distributed errors through the entire

measurement matrix M1; the total least square (TLS)

algorithm aims at solving the overdetermined system of

equations by finding ðDM2;DA) such that ðM2 2 DM2ÞB2 ¼

A 2 DA has an exact solution and kDM2;DAk2 is minimized.

This is performed via classical eigenanalysis on singular

value decomposition (SVD). From the theory of SVD,

solution B1 is known to be identical to the eigenvector of

matrix Mt1M1; corresponding to its smallest eigenvalue [8,

13].

As this method involves the solution of an eigensystem

problem, the stability in the computation of the eigenvalues

and the eigenvectors must be considered. The smallest

eigenvalue is selected and the solution depends on the

corresponding eigenvector. In the case where there are

multiple small eigenvalues, instability appears in the

solution of TLS, given the difficulty to select the correct

small eigenvalue associated with the expected eigenvector.

To solve this problem of instability, in some cases, a so-

called equilibration technique may be performed, which

consists in equilibrating the errors in different terms of the

data matrix M1 [8].

3.3. High positive-breakdown methods

Least-squares-based estimators may be completely

perturbed by a few bad leverage points or vertical outliers

as defined in Ref. [17]. The goal of positive-breakdown

methods is robustness against the possibility of several

unannounced outliers that may have occurred anywhere in

the data. There are several types of high-breakdown robust

methods, in particular the least median of squares (LMedS)

and the M-estimators. An interesting review is given in Ref.

[20].

The least-median-of-squares (LMedS) method of Rous-

seeuw [16] estimates the parameters by solving the

nonlinear minimization problem: min medir2i ; where ri is

the residual error of data i. That is, the estimator must yield

the smallest value for the median of squared residuals

computed for the entire data set. The LMedS method attains

the highest possible break-down value b ! 50%: For least

squares, the break-down value is 1=n ! 0% which means

that a single outlier may contaminate the solution. This

algorithm considers a trial subset of a selected number of

observations and computes the linear fit passing through

them. This procedure is repeated many times, and the fit

with the lowest median of squared residuals is retained. For

small data sets, it is possible to consider all subsets, whereas

for larger data sets many subsets are to be drawn at random.

As we have to deal with very large data sets, this procedure

would be very time-consuming and is not selected in our set

of methods.

Another popular robust technique is the so-called M-

estimators [10], but unlike the LMedS method, it can be

reduced to a weighted least-squares problem. It is used by

Komodakis and Tziritas in Ref. [11]. The M-estimation

problem could be expressed as follows: given a set of data

samples Yi and Xi; where Yi ¼ f ðXi; uÞ þ ri; estimate the

vector of parameters u; ri being the residual error of datum i.

The only underlying assumption is that the noise obeys a

symmetric, independent, identical distribution. The M-

estimators try to reduce the effect of outliers by replacing

the squared residuals r2i ; used in LS, by another function of

the residuals. The M-estimate u is defined as the minimum

of a global error function:

u ¼ arg minX

i

rðriÞ ð19Þ

where r is a symmetric, positive-definite function with a

unique minimum at zero, and is chosen to be subquadratic in

r. Instead of directly solving this problem, we can

implement it as the following iterated reweighted least-

squares problem:

minX

i

vðrðk21Þi Þr2

i ð20Þ

where the superscript k indicates the iteration number. The

weight vðrðk21Þi Þ should be recomputed after each iteration

in order to be used in the next iteration.

In the least squares regression, all data points are

weighted equally with vðriÞ ¼ 1: In robust M-estimation,

the function vðriÞ ¼ CðriÞ=ri provides adaptive weighting,

where CðxÞ ¼ drðxÞ=dx is called the influence function,

measuring the influence of a datum on the value of the

parameter estimate. There are several commonly used

influence functions defining M-estimators which provide

solutions for reducing the influence of ‘gross errors’, like the

Huber, the Cauchy, the Geman–McClure, the Welsch and

the Tukey M-estimators.

In our approach, we selected the Tukey (or biweight)

estimator which has the advantage of even suppressing the

outliers [11]. The Tukey’s M-estimator has the following

weighting function:

vcðrÞ ¼1 2

r

c

� �2( )2

lrl # c

0 lrl . c

8>><>>: ð21Þ

The parameter c in the above function is a scale parameter,

which plays a crucial role in the success of the M-estimator.

A 95% asymptotic efficiency on the standard normal

distribution of Tukey’s biweight estimator function is

obtained with the tuning c ¼ 4:6851 [20]. In order to

handle gross errors with respect to the data, we chose the

parameter as c ¼ c0 medianðlrilÞ; where c0 is a normalizing

constant in the range between 3 and 6 [11].

Among these high positive breakdown methods, we

decided to retain the M-estimators method described above,

that will be referred as RLS.

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx4

ARTICLE IN PRESS

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

Page 5: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

4. The proposed approach

All the previous methods, i.e. LS, TLS, RLS, try to solve

for the motion parameters using a large set of equations

where the coefficients are very unstable, given the noise

affecting the optical flow vectors u and v. This is the key

observation that has given rise to our method. Therefore, we

aim at building a minimal set of equations where the

parameters are required to be much more stable. The

coefficients of these equations are searched as optimal

according to criteria derived from the supposed noise model

of the optical flow. They are basically obtained by first

projecting the vector of coefficients of each original

equations into a low dimension space with a chosen basis

of vectors. This scheme greatly reduces the influence of

noise contamination in the new set of equation coefficients.

Moreover, unlike all the above-mentioned methods, our

estimator is designed to be unbiased.

4.1. Noise in optical flow observations

The proposed method is based on the model of the noise

affecting the optical flow data. Two noise models are

considered, both with zero-mean distribution. The case of

mean deviation, or biased motion vector estimates, is

considered separately in a subsequent paragraph, where a

robust technique is introduced. We suppose that the two

components of the motion field u and v are perturbed by

additive zero-mean Gaussian noise. The two noise processes

are assumed to be independent, and each of them is assumed

to be spatially uncorrelated. This last property is not

necessary for obtaining an unbiased estimator, but it is

included for simplifying the variance expressions.

The variance of the noise is supposed to be either

constant or proportional to the squared value of the

corresponding component. This model seems compatible

with the probability distribution of optical flow proposed in

Ref. [18] and the observations made in the review of optical

flow techniques by Barron et al. [3]. Similar noise models

are used in Refs. [7,8,12]. However, it should be noticed that

typical optical flow techniques on real sequences produce

results with an error distribution which has a substantial

number of outliers. We will focus on dealing with this issue.

Considering the proposed noise model, we have:

uðiÞ ¼ mðiÞ þ N1ðiÞ ð22Þ

vðiÞ ¼ nðiÞ þ N2ðiÞ ð23Þ

where i indexes the image points where an optical flow

vector is defined and mðiÞ and nðiÞ are the ideal optical flow

components at point i. When the ‘proportional’ model is

used the noise processes N1 and N2 are such that:

E{N1ðiÞ} ¼ E{N2ðiÞ} ¼ 0

;i – i0; E{NkðiÞNkði0Þ} ¼ 0; for k ¼ 1; 2

;i; ;i0; E{N1ðiÞN2ði0Þ} ¼ 0

E{N21 ðiÞ} ¼ s2m2ðiÞ

E{N22 ðiÞ} ¼ s2n2ðiÞ

We will describe our method first in the case of a 3-D

translation parallel to the image plane (fronto-parallel

translation) and then in the general case of full 3-D

translation.

4.2. Translation parallel to the image plane

We consider the case where the translational motion

along the optical axis is null, i.e. Tz ¼ 0: According to Eqs.

(7) and (8), we can write:

mðiÞ ¼ 2Txf

ZðiÞand nðiÞ ¼ 2

Tyf

ZðiÞð24Þ

Given that the depth ZðiÞ is unknown, we can only solve for

either g ¼ Ty=Tx or g ¼ Tx=Ty: This parameter is related to

the direction of the translation in the image plane, whose

angle to the horizontal axis is given by arctanðTy=TxÞ: We

achieve the estimation of this parameter by projecting the

observed process on a deterministic process eðiÞ that is to be

specified later. This projection will yield:

u1 ¼X

i

uðiÞeðiÞ ¼X

i

mðiÞeðiÞ þX

i

N1ðiÞeðiÞ ð25Þ

v1 ¼X

i

vðiÞeðiÞ ¼X

i

nðiÞeðiÞ þX

i

N2ðiÞeðiÞ ð26Þ

As a consequence of the above assumptions, the mean

values of variables u1 and v1 are:

E{u1} ¼ 2TxfX

i

eðiÞ

ZðiÞand E{v1} ¼ 2Tyf

Xi

eðiÞ

ZðiÞð27Þ

Their variances are given by:

var{u1} ¼ s2X

i

m2ðiÞe2ðiÞ and var{v1} ¼ s2X

i

n2ðiÞe2ðiÞ

ð28Þ

We propose to estimate g ¼ Ty=Tx if u1 . v1; or g ¼ Tx=Ty

otherwise. Without loss of generality, we consider the first

case, and the estimate will be g ¼ v1=u1:We will now consider the choice of the axis of projection

{eðiÞ}: A possible criterion is the maximization of the signal

to noise ratio of the denominator variable, i.e. u1 :

r ¼

Pi

eðiÞ

ZðiÞ

� �2

Pi

e2ðiÞ

Z2ðiÞ

Note that the criterion may be computed for the numerator

as well. This ratio is maximized if eðiÞ ¼ lZðiÞ: As ZðiÞ is

unknown but always positive, we propose to choose the

simplest one, that is, eðiÞ ¼ 1=K; where K is the number of

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx 5

ARTICLE IN PRESS

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

Page 6: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

points. The estimate is then given by:

g ¼

PivðiÞPiuðiÞ

ð29Þ

Let us now consider the ideal choice eðiÞ ¼ ZðiÞ=K: We

obtain:

E{u1} ¼ 2Txf ; E{v1} ¼ 2Tyf ð30Þ

var{u1} ¼s2

KT2

x f 2; var{v1} ¼s2

KT2

y f 2 ð31Þ

The last equations show the very important reduction of the

noise disturbance in estimating g; in this ideal case. Indeed,

it is known that under the above conditions the estimator is

unbiased and efficient, with a variance equal to s2=K: In our

case, by selecting eðiÞ ¼ 1=K; the estimator is still unbiased

but with a variance proportional to s2=K with a factor of

1 þ

var1

Z

� �2

1

Z0

� �2

0BBB@

1CCCA;

where Z0 is the mean depth of the scene. Thus, the efficiency

of the estimator depends on the variation of the depth of the

scene with respect to its mean value. If the noise is spatially

correlated, another factor increases the estimate variance.

The stronger the correlation coefficient is, the greater the

value of this factor will be. A very important property is that

our estimator is unbiased and the associated error is

proportional to s2=K:For a constant noise model, in accordance with the

previous approach, it will be necessary to weight the

measurements in order to obtain at each point approxi-

mately the same signal-to-noise ratio. As the signal-to-noise

ratio is now proportional to the real motion vector

magnitude, we suggest that the weight of the measurements

should be the measured motion vector magnitude itself.

Therefore we propose the following estimate:

g ¼

PivðiÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu2ðiÞ þ v2ðiÞp

PiuðiÞ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu2ðiÞ þ v2ðiÞp ð32Þ

4.3. Translation non parallel to the image plane

We consider the general case where the 3-D translation is

not parallel to the image plane ðTz – 0Þ: We aim at

estimating the FOE which is the point ða;bÞ ¼ðTxf=Tz; Tyf=TzÞ in the image plane. According to Eq.

(10), we can write:

nðiÞa2 mðiÞb ¼ nðiÞxðiÞ2 mðiÞyðiÞ

From the overdetermined set of equations with noisy

coefficients computed from the motion field, we propose

to obtain two equations by projecting into two deterministic

fields e1ðiÞ and e2ðiÞ: These two equations are:

v1a2 u1b ¼ w1

v2a2 u2b ¼ w2

where, for k ¼ 1; 2 :

uk ¼X

i

uðiÞekðiÞ; vk ¼X

i

vðiÞekðiÞ; wk

¼X

i

ðvðiÞxðiÞ2 uðiÞyðiÞÞekðiÞ:

We therefore obtain the estimate of the position of the FOE:

ða; bÞ ¼u1w2 2 u2w1

u1v2 2 u2v1

;v2w1 2 v1w2

u1v2 2 u2v1

� �ð33Þ

According to the assumptions on the noise model, we have:

E{u1v2 2 u2v1} ¼

�Xi

mðiÞe1ðiÞ

��Xi

nðiÞe2ðiÞ

2

�Xi

mðiÞe2ðiÞ

��Xi

nðiÞe1ðiÞ

Let us consider the mean values of the two numerators of

Eq. (33):

E{u1w2 2 u2w1} ¼

�Xi

mðiÞe1ðiÞ

£

�Xi

ðnðiÞxðiÞ2 mðiÞyðiÞÞe2ðiÞ

2

�Xi

mðiÞe2ðiÞ

£

�Xi

ðnðiÞxðiÞ2 mðiÞyðiÞÞe1ðiÞ

and

E{v2w1 2 v1w2} ¼

�Xi

nðiÞe2ðiÞ

£

�Xi

ðnðiÞxðiÞ2 mðiÞyðiÞÞe1ðiÞ

2

�Xi

nðiÞe1ðiÞ

£

�Xi

ðnðiÞxðiÞ2 mðiÞyðiÞÞe2ðiÞ

We supposeP

xðiÞ ¼P

yðiÞ ¼P

xðiÞyðiÞ ¼ 0: If this is not

the case, xðiÞ and yðiÞ are expressed in a new coordinate

system centered at their centroid and whose orthonormal

axes are the first and second principal axes of the

distribution of the points. Principal component analysis is

used for achieving these estimations. Therefore, if we set

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx6

ARTICLE IN PRESS

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

667

668

669

670

671

672

Page 7: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

e1ðiÞ ¼ lxðiÞZðiÞ and e2ðiÞ ¼ lyðiÞZðiÞ; we have:

E{u1v2 2 u2v1} ¼ ðlTzf Þ2X

i

x2ðiÞX

i

y2ðiÞ ð34Þ

E{u1w2 2 u2w1} ¼ ðlTzf Þ2aX

i

x2ðiÞX

i

y2ðiÞ ð35Þ

E{v2w1 2 v1w2} ¼ ðlTzf Þ2bX

i

x2ðiÞX

i

y2ðiÞ ð36Þ

These relations prove that the proposed estimators are

unbiased. Indeed, the quotient (35)/(36) is a and the

quantity (36)/(34) is b: We may also prove that Eqs. (34)

and (35) (idem for Eqs. (36) and (34)) are decorrelated and

that the signal-to-noise ratio for both numerator and

denominator is approximately K, the number of points. As

the depth ZðiÞ is unknown, we propose to choose as basis

e1ðiÞ ¼ xðiÞ and e2ðiÞ ¼ yðiÞ: As in the fronto-parallel

translation case, the effectiveness of this choice depends

on the variation of respect to its mean value, and also on the

spatial noise correlation.

4.4. Robustness against mean deviations

The estimated 2-D motion or optical flow field could be

affected by some systematic errors, that is, errors on the

mean value of the field. In other words, the disturbing noise

may not be zero-mean. A similar effect occurs if the two

noise components are correlated, in which case the

estimates might be biased. In the following we propose a

technique for limiting the effects of mean deviation, or

‘correcting’ the bias of the estimator.

As shown in Sections 4.1–4.3, under some assumptions,

the proposed estimators are unbiased. Therefore if the

obtained estimation does not validate this property, we can

conclude that the noise model was not suitable. The test

should be established on measurements which would not

necessitate either knowledge of the real parameter values or

knowledge of the depth. We propose to measure and test the

angle between the direction of the motion field and the

direction suggested by the estimated FOE. If the noise

model was as assumed, the parameter estimators should be

unbiased, and the average angle should be near zero. If a

significant difference is observed, we can conclude that the

noise might not be zero-mean or inter-component

correlated.

Next we illustrate a way of limiting this kind of noise

effect. We propose to ‘correct’ the motion field by rotating it

according to the observed average deviation. In addition, as

a percentage of outliers might also exist, we propose an

iterative algorithm for correcting the motion field and

rejecting the outliers. We iteratively first estimate the

amount of correction and the scale of acceptable inliers, and

then solve, for the points selected and the corrected motion

vectors, the equations presented in Sections 4.1–4.3. We

give a detailed description of the proposed procedure.

Let us denote by m the iteration index. The average angle

deviation for the full translation case is given by

um ¼1

n

Xni¼1

arctan

£ðxðiÞ2 am21Þum21ðiÞ2 ðyðiÞ2 bm21Þvm21ðiÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ððxðiÞ2 am21Þ2 þ ðyðiÞ2 bm21Þ

2Þðu2m21ðiÞ þ v2

m21ðiÞÞq

The correcting angle is a filtered version of the estimated

deviation

fm ¼ ltmum þ ð1 2 ltmÞfm21; f0 ¼ 0; 0 , t , 1:

The motion or optical flow vectors are rotated by fm;

umðiÞ

vmðiÞ

" #¼

cos fm sin fm

2sin fm cos fm

" #uðiÞ

vðiÞ

" #:

The rejection of outliers is based on the residuals of the

depth deviation

rmðiÞ ¼xðiÞ2 am

umðiÞ2

yðiÞ2 bm

vðiÞ:

The scale of the residuals is estimated by

cm ¼ 1:5 mediani{lrmðiÞl}:

Therefore points for which lrmðiÞl . cm are rejected as

outliers.

The resulting iterative algorithm is obvious. At each step,

the average angle deviation and the scale of the residuals are

calculated. Then the outliers are rejected and the motion

field is rotated. Finally the 3-D motion parameters are

estimated according to the equations of Section 4.3. The

procedure stops when the actual average angle deviation is

less than the candidate angle correction. Indeed, the angle

correction is designed to give an increasing sequence of

values which are smaller in absolute value than the

calculated average angle deviation.

5. Experimental results

5.1. Results from simulated realistic data

In order to compare the different methods and to study

the effect of noise on their accuracy, we use synthetic optical

flow fields which are contaminated by different amounts of

noise. The simulated optical flow fields are generated using

range images from the MSU/WSU Range Image Database,

available online at http://www.eexs.wsu.edu/IRL/RID/.

Given appropriate values for the intrinsic parameters of

the simulated camera (focal length and principal point), the

dimensions of the retina and the true parameters for

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx 7

ARTICLE IN PRESS

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

766

767

768

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

Page 8: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

translation, the optical flow is generated by using Eqs. (7)

and (8), and the depth value Z of each point of the range

image.

To simulate a realistic flow field, noise is introduced into

the synthetic optical flow vectors. In Ref. [8], various noise

models and levels have been tried, such as uniform noise in

a range, Gaussian noise with zero mean and specified

variance and Gaussian noise with zero mean and specified

fractional variance. Being similar to the one used in Ref.

[12], we chose the following model, which is compatible

with the basic assumptions described in Section 4:

u ¼ mþ Nð0; bÞ p 0:01 p m; v ¼ nþ Nð0; bÞ p 0:01 p n

ð37Þ

where Nð0; bÞ is a Gaussian random variable with mean 0

and standard deviation b. This noise will be referred as

‘Gaussian noise with mean 0 and standard deviation b%‘.

As noted in Ref. [12], this error model provides the ability to

synthesize errors whose range is similar to that produced by

optical flow estimation techniques. A parameter is also used

to control the percentage of optical flow points affected by

the selected noise.

We performed different experiments using the range

image ‘Blocks’ extracted from MSU/WSU range image

database, which is shown in Fig. 2. The image size is 128 £

128 pixels, the principal point being assumed to be in the

center of the image. We chose a value of 96 for the focal

length of the virtual camera, giving a field of view (FOV) of

67.48. All depths in the range image have been uniformly

scaled to the range [2500, 7000] in order to be viewed by the

virtual camera and produce meaningful optical flow.

The chosen motion parameters are ðTx; Ty; TzÞ ¼

ð45;235; 100Þ in pixels/frame for the full translation case

and ðTx; Ty; TzÞ ¼ ð45;235; 0Þ for the fronto-parallel trans-

lation case. In the case of full translation, the maximum

absolute horizontal velocity field is 5.34 with an average of

1.40. Similar values of 4.10 and 1.35 are found for the

absolute vertical velocity field. The true FOE is (43.2, 2

33.6). In the case of fronto-parallel translation, the

maximum absolute horizontal velocity field is 2.16 with

an average of 1.31. Similar values of 1.68 and 1.03 are

found for the absolute vertical velocity field. The true

fronto-parallel translation direction angle is g ¼ 219:38;i.e. 219:3 þ 180 ¼ 160:78 in the image coordinate system.

Fig. 3 shows the scaled and subsampled ideal synthesized

optical flow and some noisy versions of it. The first row

corresponds to the case of full translation whereas the

second row corresponds to the case of fronto-parallel

translation. The ideal optical flow is displayed in the first

column. The second and third columns show perturbed

optical flows with noise Nð0; 50Þ and Nð0; 100Þ with a

percentage p ¼ 100% of affected pixels.

In the case of full translation, the criterion that we use for

assessing noise tolerance is the error in the FOE estimation.

This error is defined as the angle in degrees between the

vectors ða;b; f Þ and ða; b; f Þ; given by:

ErrFOE ¼ arccosaaþ bbþ f 2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ða2 þ b2 þ f 2Þða2 þ b2 þ f 2Þ

q0B@

1CA ð38Þ

where ða;bÞ is the true FOE and ða; bÞ is the estimated one.

In the case of fronto-parallel translation, the criterion is

the error in degrees in the direction of the translation in the

image plane. This error is defined as:

Errg ¼ larctanðgÞ2 arctanðgÞl ð39Þ

The methods which have been implemented for comparison

are Least Squares (LS), Total Least Squares (TLS), M-

estimators reweighted least squares with c0 ¼ 4:5 (RLS-4.5)

and the proposed method (PROJ).

These different methods are compared for noise tolerance

by computing ErrFOE or Errg; depending on the type of

translation, versus the noise standard deviation b, varying

from 0 to 100% for p ¼ 100% of affected pixels. For each

noise level, the computed values are average values over 50

runs. Fig. 4(a) plots the values of ErrFOE in the full

translation case and Fig. 4(b) plots the values of Errg in the

fronto-parallel translation case. It can be seen from these

two graphs that our method is far more efficient than the

other three, giving maximum errors of 0.67 in the full

translation case and 0.248 in the fronto-parallel translation

case. As expected, LS is the least tolerant to noise with

respective maximum errors of 24.61 and 16.57. RLS-4.5 is

much more efficient than TLS in the case of full translation.

It is slightly less efficient than TLS in the fronto-parallel

translation case. RLS-4.5 achieves good performance by

rejecting outliers, up to 50% of the data with c0 ¼ 1: In theFig. 2. The range image Blocks used for synthesizing flow field (intensity is

proportional to depth, the closest points being brighter).

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx8

ARTICLE IN PRESS

785

786

787

788

789

790

791

792

793

794

795

796

797

798

799

800

801

802

803

804

805

806

807

808

809

810

811

812

813

814

815

816

817

818

819

820

821

822

823

824

825

826

827

828

829

830

831

832

833

834

835

836

837

838

839

840

841

842

843

844

845

846

847

848

849

850

851

852

853

854

855

856

857

858

859

860

861

862

863

864

865

866

867

868

869

870

871

872

873

874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

Page 9: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

present case, with c0 ¼ 4:5; approximately 20% of outliers

were rejected. On the other hand, being iterative, this

method is more time-consuming.

The proposed method proves to be very tolerant to the

noise model we applied which has been found to be close to

the one affecting real optical flows. Moreover, it offers the

advantage of being very fast and easily implemented, since

it consists primarily of projection and summation. Con-

sidering this particular aspect, TLS is more computationally

expensive, performing singular value decomposition.

Eigenanalysis is also proved to be unstable when the matrix

is ill-conditioned which may happen if the amount of noise

is large.

5.2. Results from real data

The algorithms were applied to the well-known ‘marbled

block’ and ‘flower garden’ sequences, with known ground

truth values. The marbled block sequence was captured by a

robot arm moving in full translation over a textured floor

[14]. Four dark blocks lying on the floor are stationary while

a white block in the middle of the scene is moving

independently. The standard sequence flower garden was

shot from a camera placed on a driving car and corresponds

to a fronto-parallel translation along the horizontal axis Tx

of the camera. The scene contains a tree in the foreground, a

textured garden, and a house in the background. The

marbled block sequence contains many sharp discontinu-

ities in depth and the flower garden sequence presents some

non-textured areas that cause problem for the optical flow

computation, giving rise to an important number of outliers.

Optical flows are computed using the algorithm of Anandan

[2]. They are displayed in Fig. 5.

It may be noticed that these real cases differ from the

synthetic cases because of the presence of strong outliers.

Our method, in the original form (PROJ), has not been

designed explicitly to be optimal in that case. To better

handle the outliers and the failures of the model, the

technique presented in 4.4 is applied. The robust version of

our method is referred as ROB-PROJ.

Table 1 gives the results of the different algorithms on

these two sequences. The proposed method is the most

efficient of the set on these real examples as well, especially

in the fronto-parallel translation case. These results tend to

Fig. 3. Ideal and noisy synthesized optical flows in the two cases of translation.

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx 9

ARTICLE IN PRESS

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

1000

1001

1002

1003

1004

1005

1006

1007

1008

Page 10: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

Fig. 4. Comparison of the four methods for noise tolerance in the cases of (a) full translation and (b) fronto-parallel translation.

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx10

ARTICLE IN PRESS

1009

1010

1011

1012

1013

1014

1015

1016

1017

1018

1019

1020

1021

1022

1023

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

Page 11: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

show that the assumptions made on the noise model and on

the criteria of selection of the projection bases were

generally valid. As another source for comparison, Danii-

lidis reported a result with ErrFOE ¼ 7:248 for the marbled

block sequence [6].

6. Conclusion

In this paper, we have presented a novel method for

estimating the parameters of translational motion from

optical flow. Our results on synthetic and real optical flows

are more accurate than the other tested methods. This is due

to the fact that our scheme, unlike the other methods, is

based on an unbiased estimator that strongly reduces the

influence of noise contamination in the data. Moreover,

computational requirements are low, making this method

very attractive for fast 3-D translational motion parameter

estimation. In order to handle outlier suppression, an

iterative scheme gives even better results within a few

Fig. 5. Original frames and optical flows for (a) marbled block and (b) flower garden.

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx 11

ARTICLE IN PRESS

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

1166

1167

1168

1169

1170

1171

1172

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

1216

1217

1218

1219

1220

1221

1222

1223

1224

1225

1226

1227

1228

1229

1230

1231

1232

Page 12: Optimal projection of 2-D displacements for 3-D ...tziritas/papers/motionIVC.pdf · Optimal projection of 2-D displacements for 3-D translational motion estimation Christophe Garcia,

UNCORRECTED PROOF

iterations. We are currently working on the extension of this

method to the general case of 3-D motion.

7. Uncited reference

[19].

References

[1] G. Adiv, Determine three-dimensional motion and structure from

optical flow generated by several moving objects, IEEE Transactions

on pattern Analysis Machine Intelligence 7 (1985) 384–401.

[2] P. Anandan, A computational framework and an algorithm for the

measurement of visual motion generated by several moving objects,

International Journal of Computer Vision 2 (1989) 283–310.

[3] J.L. Barron, D.J. Fleet, S.S. Beauchemin, Performance of optical flow

techniques, International Journal of Computer Vision 12 (1994)

43–77.

[4] P. Bouthemy, E. Francois, Motion segmentation and qualitative

dynamics scene analysis from an image sequence, International

Journal of Computer Vision 10 (1993) 157–182.

[5] A.R. Bruss, B.K.P. Horn, Passive navigation, Computer Vision,

Graphics and Image Processing 21 (1983) 3–20.

[6] K. Daniilidis, Fixation simplifies 3-D motion estimation, Computer

Vision and Image Understanding 68 (1997) 158–169.

[7] A.M. Earnshaw, S. Blostein, The performance of camera translation

direction estimators from optical flow, comparison, and theoretical

limits, IEEE Transactions on Pattern Analysis and Machine

Intelligence 18 (1996) 927–932.

[8] N. Gupta, L. Kanal, 3-D motion estimation from motion field,

Artificial Intelligence 78 (1995) 45–86.

[9] D.J. Heeger, A.D. Jepson, Subspace methods for recovering rigid

motion i: algorithm and implementation, International journal of

Computer Vision 7 (1992) 95–117.

[10] P. Huber, Robust Statistics, Wiley, New York, 1981.

[11] N. Komodakis, G. Tziritas, Robust 3-D motion estimation and depth

layering, Proceedings of International Conference on Digital Signal

Processing, Santorini, Greece (1997) 425–428.

[12] N. Da Voctoria Lobo, J.K. Tsotsos, Computing egomotion and

detecting independent motion from image motion using collinear

points, Computer Vision and Image Processing 64 (1996) 21–52.

[13] M. Muhlich, R. Mester, The role of total least squares in motion

analysis, Proceedings of European Conference on Computer Vision,

Freiburg, Germany (1998) 305–321.

[14] M. Otte, H. Nagel, Estimation of optical flow based on higher-order

spatiotemporal derivatives in interlaced and non-interlaced image

sequences, Artificial Intelligence 78 (1) (1995) 5–43.

[15] K. Prazdny, Estimation and relative depth from optical flow,

Biological Cybernetics 36 (1980) 87–102.

[16] P.J. Rousseeuw, Least median of squares regression, Journal of

American Statistical Assocation 84 (1984) 871–880.

[17] P.J. Rousseeuw, A.M. Leroy, Robust Regression and Outlier

Detection, Wiley-Interscience, New York, 1987.

[18] E. Simoncelli, E. Adelson, D. Heeger, Probability distributions of

optical flow, Proceedings of Computer Vision and Pattern Recog-

nition (1991) 310–315.

[19] G. Tziritas, C. Labit, Motion analysis for image sequence coding,

Elsevier, Amsterdam, 1994.

[20] Z. Zhang, Prmeter estimation techniques: a tutorial with application to

conic fitting, Image and Vision Computing 15 (1997) 59–76.

Table 1

Comparative results on real optical flows

Sequence Marbled block Flower garden

Type Full translation Fronto-parallel Translation

Truth ða;bÞ ¼ ð2777:0; 95:6Þ g ¼ 08

LS ErrFOE ¼ 7:588 Errg ¼ 2:738

TLS ErrFOE ¼ 5:258 Errg ¼ 2:678

RLS-4.5 ErrFOE ¼ 5:428 Errg ¼ 2:638

PROJ ErrFOE ¼ 4:948 Errg ¼ 1:428

ROB-PROJ ErrFOE ¼ 3:658 Errg ¼ 0:928

IMAVIS 1900—22/7/2002—17:43—DMESSENGER—51394— MODEL 5

C. Garcia, G. Tziritas / Image and Vision Computing xx (0000) xxx–xxx12

ARTICLE IN PRESS

1233

1234

1235

1236

1237

1238

1239

1240

1241

1242

1243

1244

1245

1246

1247

1248

1249

1250

1251

1252

1253

1254

1255

1256

1257

1258

1259

1260

1261

1262

1263

1264

1265

1266

1267

1268

1269

1270

1271

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

1344