wcvim 09: depth estimation from multi-view

39
Depth estimation from Multi-View sources based on full search and Total Variation regularization Carlos V´ azquez Wa James Tam Advanced Video Systems Broadcasting Technologies Communications Research Centre Canada (CRC) International Workshop on Computer Vision and Its Application to Image Media Processing Tokyo, Japan

Upload: carlos-vazquez

Post on 13-Jul-2015

1.361 views

Category:

Documents


1 download

TRANSCRIPT

Depth estimation from Multi-View sources based

on full search and Total Variation regularization

Carlos Vazquez Wa James Tam

Advanced Video SystemsBroadcasting Technologies

Communications Research Centre Canada (CRC)

International Workshop on Computer Vision andIts Application to Image Media Processing

Tokyo, Japan

Outline

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 2 / 24

Introduction

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 3 / 24

Introduction

3D-TV: is on the way!!Next step in television broadcasting

1 More content available in 3D:◮ 3D cinema (IMAX, RealD)◮ Live 3D (U2-3D, sport events)◮ Video games (3D at home)

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 4 / 24

Introduction

3D-TV: is on the way!!Next step in television broadcasting

1 More content available in 3D:◮ 3D cinema (IMAX, RealD)◮ Live 3D (U2-3D, sport events)◮ Video games (3D at home)

2 Availability of 3D displays:◮ Stereoscopic (with glasses)◮ Auto-stereoscopic (no glasses)

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 4 / 24

Introduction

3D-TV: is on the way!!Next step in television broadcasting

1 More content available in 3D:◮ 3D cinema (IMAX, RealD)◮ Live 3D (U2-3D, sport events)◮ Video games (3D at home)

2 Availability of 3D displays:◮ Stereoscopic (with glasses)◮ Auto-stereoscopic (no glasses)

3 Ongoing work to develop coding standards:◮ Stereo extension to MPEG◮ Depth coding extension to MPEG

(2D+Depth)◮ Multi-View coding standard (JMVM)◮ 3D@Home consortium

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 4 / 24

Depth information for 3D-TV

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 5 / 24

Depth information for 3D-TV

Depth information in 3D-TV broadcastingAn essential information

Large variety of viewers and viewing devices:◮ Need to adjust the amount of depth perceived.◮ Need to adjust the depth to the size of the display.◮ Coding of multi-view or stereoscopic sources.

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 6 / 24

Depth information for 3D-TV

Depth information in 3D-TV broadcastingAn essential information

Large variety of viewers and viewing devices:◮ Need to adjust the amount of depth perceived.◮ Need to adjust the depth to the size of the display.◮ Coding of multi-view or stereoscopic sources.

How to fulfill these requirements?◮ Generation of new views from the ones available.

⋆ Depth-Image-Based rendering.⋆ Intermediate View Reconstruction.

◮ Predictive coding of 3D sources.

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 6 / 24

Depth information for 3D-TV

Depth information in 3D-TV broadcastingAn essential information

Large variety of viewers and viewing devices:◮ Need to adjust the amount of depth perceived.◮ Need to adjust the depth to the size of the display.◮ Coding of multi-view or stereoscopic sources.

How to fulfill these requirements?◮ Generation of new views from the ones available.

⋆ Depth-Image-Based rendering.⋆ Intermediate View Reconstruction.

◮ Predictive coding of 3D sources.

⇒ Knowledge of depth becomes essential for 3D-TV.

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 6 / 24

Depth information for 3D-TV

Depth information in 3D-TV broadcastingDepth is embedded in Multi-View sources

2D

Multi−View source

+

D

P1 P2 PN

P

XY

Z

Cam

era

N

Cam

era

2

Cam

era

1

x1 x2xN

B N

f

z

Problem statement

Recover the depth information from a Multi-View source to be used in thetransmission, processing and coding of the Multi-View video content.

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 7 / 24

Depth from Multi-View sources

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 8 / 24

Depth from Multi-View sources Algorithm overview

Depth estimation from Multi-View sourcesProposed algorithm overview

Depth estimation from Multi-View sources with TV regularization

Full scan of possible depth values and subsequent refining of depth withTotal-Variation regularization combined with edge correspondence andvisibility consistency

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 9 / 24

Depth from Multi-View sources Algorithm overview

Depth estimation from Multi-View sourcesProposed algorithm overview

Depth estimation from Multi-View sources with TV regularization

Full scan of possible depth values and subsequent refining of depth withTotal-Variation regularization combined with edge correspondence andvisibility consistency

1 Pre-processing of the Multi-View source◮ Noise reduction: A general noise removing step is applied.◮ Gradient computation: We add the gradient information ∇Io as two

new ’color’ channels to the color image.◮ Edges extraction: Image edges are used in the depth estimation

process. Edge map ǫo = δc(Io).

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 9 / 24

Depth from Multi-View sources Algorithm overview

Depth estimation from Multi-View sourcesProposed algorithm overview

Depth estimation from Multi-View sources with TV regularization

Full scan of possible depth values and subsequent refining of depth withTotal-Variation regularization combined with edge correspondence andvisibility consistency

1 Pre-processing of the Multi-View source

2 Error volume generation

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 9 / 24

Depth from Multi-View sources Algorithm overview

Depth estimation from Multi-View sourcesProposed algorithm overview

Depth estimation from Multi-View sources with TV regularization

Full scan of possible depth values and subsequent refining of depth withTotal-Variation regularization combined with edge correspondence andvisibility consistency

1 Pre-processing of the Multi-View source

2 Error volume generation3 First depth approximation

◮ Median filter

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 9 / 24

Depth from Multi-View sources Algorithm overview

Depth estimation from Multi-View sourcesProposed algorithm overview

Depth estimation from Multi-View sources with TV regularization

Full scan of possible depth values and subsequent refining of depth withTotal-Variation regularization combined with edge correspondence andvisibility consistency

1 Pre-processing of the Multi-View source

2 Error volume generation

3 First depth approximation4 Depth refining

◮ TV regularization◮ Edge correspondence◮ Visibility consistency

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 9 / 24

Depth from Multi-View sources Error volume generation

Error volume generationOverview

4

v5

d4 d3 d2 d1

d5

X

V

v1

v2

v3

v

Motivation

For each pixel in the central view and depth value a similarity measure isevaluated for correspondent pixels in all views. The depth with the bestsimilarity measure is accepted as the best estimate.

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 10 / 24

Depth from Multi-View sources Error volume generation

Error volume generationEquations

Mean square error across ’colors’:

Ev (x, d) =1

C

C∑

c=1

(Iv (To,v (x, d), c) − Io(x, c))2

Mean error across ’views’

E (x, d) =1

N (x, d)

v∈Rm(x,d)

Ev (x, d)

Matched views

Rm = {v : Ev (x, d) < Tm}

Number of matched views

N (x, d) =∑

v∈V(x,d)

(

Ev (x, d) < Tm

)

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 11 / 24

Depth from Multi-View sources Error volume generation

Error volume generationError volume and visibility: Example

6

Dep

th

-x

Error volume6

Dep

th

-x

Number of matching views

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 12 / 24

Depth from Multi-View sources First depth approximation

First depth approximationDirect minimization of error measure

1 Minimize the error by penalizing disparitieswith less matching views:

D0(x) = arg mind

E (x, d)

(

V(x, d)

N (x, d)

)2

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 13 / 24

Depth from Multi-View sources First depth approximation

First depth approximationDirect minimization of error measure

1 Minimize the error by penalizing disparitieswith less matching views:

D0(x) = arg mind

E (x, d)

(

V(x, d)

N (x, d)

)2

2 Apply a median filter to remove noise fromthe estimated depth map.

D(1) = HM(D(0))

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 13 / 24

Depth from Multi-View sources Depth refining

Depth refiningTotal variation regularization

Depth as a function that minimizes a two-term global energy:

D(x) = arg minD

(Gd(D, E ) + λGr (D))

Data term

Gd(D, E ) =1

2

x∈Λo

‖E (x,D[x])‖2

Regularization term

Gr (D) =

Wo

‖∇xD(n)‖ dWo

Level set minimization

D(n+1) = D(n) + ∆T

(

λκ‖∇xD(n)‖ −

(

∂E

∂dE (D(n))

)

)

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 14 / 24

Depth from Multi-View sources Depth refining

Depth refiningEdge correspondence

1 Image edges

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 15 / 24

Depth from Multi-View sources Depth refining

Depth refiningEdge correspondence

1 Image edges

2 Distance to image edges:

F(x) = max(dist(x, ǫo), FM)

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 15 / 24

Depth from Multi-View sources Depth refining

Depth refiningEdge correspondence

1 Image edges

2 Distance to image edges:

F(x) = max(dist(x, ǫo), FM)

3 Depth edges

η(n) = δc(D(n))

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 15 / 24

Depth from Multi-View sources Depth refining

Depth refiningEdge correspondence

1 Image edges

2 Distance to image edges:

F(x) = max(dist(x, ǫo), FM)

3 Depth edges

η(n) = δc(D(n))

4 Edge correction term

φ(x) = η(n)(x)F(x)sign(

∇D(n)(x) · ∇F(x))

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 15 / 24

Depth from Multi-View sources Depth refining

Depth refiningVisibility consistency

Estimated visibility vs. matching visibility

Compare the visibility resulting from the estimated depth map to thevisibility suggested by the number of matching views.

Estimated visibility

Q(x) =V(x,D(n)(x)) −

L

v=1 (Ov (xv ) 6= xv )

V(x,D(n)(x))

Matching visibility

S(x) =N (x)

V(x)

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 16 / 24

Depth from Multi-View sources Depth refining

Depth refiningVisibility consistency

Estimated visibility vs. matching visibility

Compare the visibility resulting from the estimated depth map to thevisibility suggested by the number of matching views.

Estimated visibility

Q(x) =V(x,D(n)(x)) −

L

v=1 (Ov (xv ) 6= xv )

V(x,D(n)(x))

Matching visibility

S(x) =N (x)

V(x)

Occluded and occluding regions

Ba = {x | (Q(x) < 1) ∧ (S(x) > Q(x))}

Ja = {x = Ov (u) | Q(x) = 1}

Conflict

B = {y ∈ Ba|x ∈ Ja}

J = {x ∈ Ja|S(x) < 1}

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 16 / 24

Depth from Multi-View sources Depth refining

Depth refiningVisibility consistency

Estimated visibility vs. matching visibility

Compare the visibility resulting from the estimated depth map to thevisibility suggested by the number of matching views.

Estimated visibility

Q(x) =V(x,D(n)(x)) −

L

v=1 (Ov (xv ) 6= xv )

V(x,D(n)(x))

Matching visibility

S(x) =N (x)

V(x)

Conflict

B = {y ∈ Ba|x ∈ Ja}

J = {x ∈ Ja|S(x) < 1}

Correction

B ⇒ pushed to Foreground

J ⇒ pushed to Background

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 16 / 24

Depth from Multi-View sources Depth refining

Depth refiningFinal evolution equation

Level sets evolution equation

D(n+1) = D(n) + ∆T

(

λκ‖∇xD(n)‖ −

∂E

∂dE (D(n)) + µΦ + β(B − J )

)

1 Total variation regularization

2 Minimization of Multi-View matching error

3 Image and depth edges correspondence

4 Occlusion correction by visibility check

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 17 / 24

Experimental results

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 18 / 24

Experimental results

Experimental resultsTest images and depth maps.

Original color images: View 2

Original depth images: View 2

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 19 / 24

Experimental results

Experimental resultsResulting depth maps and error.

Estimated depth image: View 2

Error with respect to ground-truth: 1 pixel differences

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 20 / 24

Experimental results

Experimental resultsError with respect to ground-truth.

Image Venus Teddy Cones Art Bowling2

PSNR(dB) 51.96 44.02 44.76 36.72 36.26E > 1(%) 6.93 10.96 8.01 18.99 17.80E > 2(%) 2.19 6.49 4.13 11.88 10.46

1 PSNR indicates that results close to ground-truth

2 Errors larger than 1 pixel are large

3 Errors larger than 2 pixels drop significantly

4 A 2 pixels error is manageable in intended application

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 21 / 24

Experimental results Application: Multi-View image coding

Experimental resultsApplication: Multi-View image coding

2D+Depth+Occlusions Multi-View coding system

N

Disocclu.View N

View 1

View 2

Encode

Decode

Mask

Wav. Tran.

Encode

DEmbed

Tx

Edges

WCM

E

2D

I N

D 2D

2D+D

DepthEstimation

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 22 / 24

Experimental results Application: Multi-View image coding

Experimental resultsApplication: Multi-View image coding

Decoded images: Estimated depth map

Venus 32.19dB Teddy 31.40dB Cones 30.84dB

Decoded images: Real depth map

Venus 35.96dB Teddy 31.93dB Cones 31.81dB

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 22 / 24

Conclusions

Outline

1 Introduction

2 Depth information for 3D-TV

3 Depth from Multi-View sources

Algorithm overviewError volume generationFirst depth approximationDepth refining

4 Experimental results

Application: Multi-View image coding

5 Conclusions

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 23 / 24

Conclusions

Conclusions

High quality depth estimation from Multi-View sources.

Occlusion processing by analysis of visibility consistency.

Total-Variation regularization ensures smooth depth with sharp edges.

Application to Multi-View image coding

Outlook◮ Improve the visibility consistency step.◮ Speed-up the algorithm execution.◮ Integrating into a MPEG-2 standard stream.

Vazquez, Tam (CRC) 3D–TV: Depth estimation WCVIM’09 24 / 24