wcvim 09: depth estimation from multi-view

Depth estimation from Multi-View sources based

on full search and Total Variation regularization

Carlos Vazquez Wa James Tam

Advanced Video SystemsBroadcasting Technologies

Communications Research Centre Canada (CRC)

International Workshop on Computer Vision andIts Application to Image Media Processing

Tokyo, Japan

Outline

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 2 / 24

Introduction

Outline

1 Introduction






5 Conclusions


Introduction

3D-TV: is on the way!!Next step in television broadcasting

1 More content available in 3D:◮ 3D cinema (IMAX, RealD)◮ Live 3D (U2-3D, sport events)◮ Video games (3D at home)


Introduction



2 Availability of 3D displays:◮ Stereoscopic (with glasses)◮ Auto-stereoscopic (no glasses)


Introduction



2 Availability of 3D displays:◮ Stereoscopic (with glasses)◮ Auto-stereoscopic (no glasses)

3 Ongoing work to develop coding standards:◮ Stereo extension to MPEG◮ Depth coding extension to MPEG

(2D+Depth)◮ Multi-View coding standard (JMVM)◮ 3D@Home consortium


Depth information for 3D-TV

Outline

1 Introduction






5 Conclusions



Depth information in 3D-TV broadcastingAn essential information

Large variety of viewers and viewing devices:◮ Need to adjust the amount of depth perceived.◮ Need to adjust the depth to the size of the display.◮ Coding of multi-view or stereoscopic sources.





How to fulfill these requirements?◮ Generation of new views from the ones available.

⋆ Depth-Image-Based rendering.⋆ Intermediate View Reconstruction.

◮ Predictive coding of 3D sources.





How to fulfill these requirements?◮ Generation of new views from the ones available.

⋆ Depth-Image-Based rendering.⋆ Intermediate View Reconstruction.

◮ Predictive coding of 3D sources.

⇒ Knowledge of depth becomes essential for 3D-TV.



Depth information in 3D-TV broadcastingDepth is embedded in Multi-View sources

2D

Multi−View source

+

D

P1 P2 PN

P

XY

Z

Cam

era

N

Cam

era

2

Cam

era

1

x1 x2xN

B N

f

z

Problem statement

Recover the depth information from a Multi-View source to be used in thetransmission, processing and coding of the Multi-View video content.


Depth from Multi-View sources

Outline

1 Introduction






5 Conclusions


Depth from Multi-View sources Algorithm overview

Depth estimation from Multi-View sourcesProposed algorithm overview

Depth estimation from Multi-View sources with TV regularization

Full scan of possible depth values and subsequent refining of depth withTotal-Variation regularization combined with edge correspondence andvisibility consistency






1 Pre-processing of the Multi-View source◮ Noise reduction: A general noise removing step is applied.◮ Gradient computation: We add the gradient information ∇Io as two

new ’color’ channels to the color image.◮ Edges extraction: Image edges are used in the depth estimation

process. Edge map ǫo = δc(Io).






1 Pre-processing of the Multi-View source

2 Error volume generation







2 Error volume generation3 First depth approximation

◮ Median filter







2 Error volume generation

3 First depth approximation4 Depth refining

◮ TV regularization◮ Edge correspondence◮ Visibility consistency


Depth from Multi-View sources Error volume generation

Error volume generationOverview

4

v5

d4 d3 d2 d1

d5

X

V

v1

v2

v3

v

Motivation

For each pixel in the central view and depth value a similarity measure isevaluated for correspondent pixels in all views. The depth with the bestsimilarity measure is accepted as the best estimate.



Error volume generationEquations

Mean square error across ’colors’:

Ev (x, d) =1

C

C∑

c=1

(Iv (To,v (x, d), c) − Io(x, c))2

Mean error across ’views’

E (x, d) =1

N (x, d)

∑

v∈Rm(x,d)

Ev (x, d)

Matched views

Rm = {v : Ev (x, d) < Tm}

Number of matched views

N (x, d) =∑

v∈V(x,d)

(

Ev (x, d) < Tm

)



Error volume generationError volume and visibility: Example

6

Dep

th

-x

Error volume6

Dep

th

-x

Number of matching views


Depth from Multi-View sources First depth approximation

First depth approximationDirect minimization of error measure

1 Minimize the error by penalizing disparitieswith less matching views:

D0(x) = arg mind

E (x, d)

(

V(x, d)

N (x, d)

)2


Depth from Multi-View sources First depth approximation

First depth approximationDirect minimization of error measure

1 Minimize the error by penalizing disparitieswith less matching views:

D0(x) = arg mind

E (x, d)

(

V(x, d)

N (x, d)

)2

2 Apply a median filter to remove noise fromthe estimated depth map.

D(1) = HM(D(0))


Depth from Multi-View sources Depth refining

Depth refiningTotal variation regularization

Depth as a function that minimizes a two-term global energy:

D(x) = arg minD

(Gd(D, E ) + λGr (D))

Data term

Gd(D, E ) =1

2

∑

x∈Λo

‖E (x,D[x])‖2

Regularization term

Gr (D) =

∫

Wo

‖∇xD(n)‖ dWo

Level set minimization

D(n+1) = D(n) + ∆T

(

λκ‖∇xD(n)‖ −

(

∂E

∂dE (D(n))

)

)



Depth refiningEdge correspondence

1 Image edges




1 Image edges

2 Distance to image edges:

F(x) = max(dist(x, ǫo), FM)




1 Image edges



3 Depth edges

η(n) = δc(D(n))




1 Image edges



3 Depth edges

η(n) = δc(D(n))

4 Edge correction term

φ(x) = η(n)(x)F(x)sign(

∇D(n)(x) · ∇F(x))



Depth refiningVisibility consistency

Estimated visibility vs. matching visibility

Compare the visibility resulting from the estimated depth map to thevisibility suggested by the number of matching views.

Estimated visibility

Q(x) =V(x,D(n)(x)) −

∑

L

v=1 (Ov (xv ) 6= xv )

V(x,D(n)(x))

Matching visibility

S(x) =N (x)

V(x)







Q(x) =V(x,D(n)(x)) −

∑

L

v=1 (Ov (xv ) 6= xv )

V(x,D(n)(x))

Matching visibility

S(x) =N (x)

V(x)

Occluded and occluding regions

Ba = {x | (Q(x) < 1) ∧ (S(x) > Q(x))}

Ja = {x = Ov (u) | Q(x) = 1}

Conflict

B = {y ∈ Ba|x ∈ Ja}

J = {x ∈ Ja|S(x) < 1}







Q(x) =V(x,D(n)(x)) −

∑

L

v=1 (Ov (xv ) 6= xv )

V(x,D(n)(x))

Matching visibility

S(x) =N (x)

V(x)

Conflict

B = {y ∈ Ba|x ∈ Ja}

J = {x ∈ Ja|S(x) < 1}

Correction

B ⇒ pushed to Foreground

J ⇒ pushed to Background



Depth refiningFinal evolution equation

Level sets evolution equation

D(n+1) = D(n) + ∆T

(

λκ‖∇xD(n)‖ −

∂E

∂dE (D(n)) + µΦ + β(B − J )

)

1 Total variation regularization

2 Minimization of Multi-View matching error

3 Image and depth edges correspondence

4 Occlusion correction by visibility check


Experimental results

Outline

1 Introduction






5 Conclusions



Experimental resultsTest images and depth maps.

Original color images: View 2

Original depth images: View 2



Experimental resultsResulting depth maps and error.

Estimated depth image: View 2

Error with respect to ground-truth: 1 pixel differences



Experimental resultsError with respect to ground-truth.

Image Venus Teddy Cones Art Bowling2

PSNR(dB) 51.96 44.02 44.76 36.72 36.26E > 1(%) 6.93 10.96 8.01 18.99 17.80E > 2(%) 2.19 6.49 4.13 11.88 10.46

1 PSNR indicates that results close to ground-truth

2 Errors larger than 1 pixel are large

3 Errors larger than 2 pixels drop significantly

4 A 2 pixels error is manageable in intended application


Experimental results Application: Multi-View image coding

Experimental resultsApplication: Multi-View image coding

2D+Depth+Occlusions Multi-View coding system

N

Disocclu.View N

View 1

View 2

Encode

Decode

Mask

Wav. Tran.

Encode

DEmbed

Tx

Edges

WCM

E

2D

I N

D 2D

2D+D

DepthEstimation


Experimental results Application: Multi-View image coding

Experimental resultsApplication: Multi-View image coding

Decoded images: Estimated depth map

Venus 32.19dB Teddy 31.40dB Cones 30.84dB

Decoded images: Real depth map

Venus 35.96dB Teddy 31.93dB Cones 31.81dB


Conclusions

Outline

1 Introduction






5 Conclusions


Conclusions

Conclusions

High quality depth estimation from Multi-View sources.

Occlusion processing by analysis of visibility consistency.

Total-Variation regularization ensures smooth depth with sharp edges.

Application to Multi-View image coding

Outlook◮ Improve the visibility consistency step.◮ Speed-up the algorithm execution.◮ Integrating into a MPEG-2 standard stream.


wcvim 09: depth estimation from multi-view

Documents

dtv3 depth

d sources

d displays

depth estimation wcvim09

dtvdepth information

d cinema imax

mpeg depth coding extension

coding of multiview