zucheul lee , ramsin khoshabeh , jason juang and truong q. nguyen (ucsd)

35
Local Stereo Matching Using Motion Cue and Modified Census in Video Disparity Estimation Zucheul Lee, Ramsin Khoshabeh, Jason Juang and Truong Q. Nguyen (UCSD) 20th European Signal Processing Conference (EUSIPCO 2012) 1

Upload: olympe

Post on 10-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Local Stereo M atching U sing Motion C ue and Modified C ensus in Video D isparity E stimation. Zucheul Lee , Ramsin Khoshabeh , Jason Juang and Truong Q. Nguyen (UCSD). 20th European Signal Processing Conference (EUSIPCO 2012 ). Outline. Introduction Framework - PowerPoint PPT Presentation

TRANSCRIPT

Video Motion Interpolation for Special Effect Applications

Local Stereo Matching Using Motion Cue and Modified Census in Video Disparity EstimationZucheul Lee, Ramsin Khoshabeh, Jason Juang and Truong Q. Nguyen(UCSD)20th European Signal Processing Conference (EUSIPCO 2012)1OutlineIntroductionFrameworkProposed Algorithm Experimental ResultsConclusion2Introduction33BackgroundThe disparity estimation has been thoroughly studiedFocus strictly on imagesVideo disparity estimation:(1) Lack of video datasets with ground-truth disparity maps(2) Temporal inconsistency problemsflickering resulting from simply applying image-based algorithms to video4

BackgroundFundamental attributes that group objects together locally:ProximitySimilarityMotionThe objects grouped by these attributes are most likely to have the same depth.

5Image disparity estimation- Important for accurate depth estimation near edges of moving objectsObjectivePropose a more accurate and noise tolerant method for video disparity estimationMore accurate than other methods on edges and in flat (textureless) areasUsing:Motion cues (edges)Modified census transform (flat areas)Spatio-temporal consistency (refinement)

6Related WorkAdaptive Weight[6] Cost-volume filtering[7] Guided filter

Spatio-Temporal Consistency[3]

7

[7] C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, Fast Cost-Volume Filtering for Visual Correspondence and Beyond, in Proc.IEEE Intl. Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 3017-3024,2011.[3] R. Khoshabeh, S. H. Chan, and T. Q. Nguyen, Spatio-Temporal Consistency in Video Disparity Estimation, ICASSP, pp. 885-888, 2011[6] K.-J. Yoon and I.-S. Kweon, Adaptive Support-Weight Approach for Correspondence Search, IEEE Trans. Pattern Anal. Mach. Intell., vol.28, no. 4, pp. 650-656, 2006.Do not provide a reliable solution for disparity estimation in textureless (flat) areasFramework8Framework9ProposedAlgorithm10Support Weight Using Correlated Color and Motion11

Support Weight Using Correlated Color and MotionLet and be the color coordinates of pixel c and neighbor pixel q in the CIELab color spaceColor difference:

Let and be the flow vectors[10] of pixel c and neighbor pixel qTruncated motion difference:

: truncation value

12

[10] D. Sun, S. Roth, M.J. Black, Secrets of Optical Flow Estimation and Their Principles, CVPR, pp. 2432-2439, 2010.13Benefits of a Motion CueThe car video frames (480x270 15 disparity levels):

ProximityProximity + SimilarityProximity + Similarity + Motion

Modified Census TransformDifficult in finding the correct correspondences in flat areas.Due to the fact that the census matching cost is extremely sensitive to image noise since all pixels in flat areas have a similar intensity.

Three moded census transform with a noise buffer14Problem:Solution:14Modified Census TransformUsing two bits to implement three modes:noise buffer threshold

15Set 10 if (neighbor pixel intensity) - (center pixel intensity) > Set 01 if (neighbor pixel intensity) - (center pixel intensity) < Set 00 otherwiseIntensity value 0~50 = 0 Intensity value 50~100 = 1 Intensity value 100~150 = 2 Intensity value 150~200 = 3Intensity value 200~255 = 4

15Modified Census Transform16

16Aggregation and disparity ComputationAggregated matching cost:

Winner-take-all (WTA):

17

Aggregation and disparity Computation18

Left viewOriginal censusModified census(without intensity difference)Modified censusSpatio-temporal Consistency[3]Simply applying image-based algorithms to individual frames temporally inconsistent (even the best methods) Consider the sequence of disparity maps as a space-time volumeA three-dimensional function f(x,y,t) with (x,y) : spatial coordinatest : temporal coordinatePiecewise smooth solution:has less temporal noisepreserves the disparity information as much as possible

19Problem:Solution:Spatio-temporal Consistency[3]l1 minimization problem:

f : unknown disparity mapg : initial disparity map from the previous stepD : forward difference operator

20

Spatio-temporal Consistency[3]l1 minimization problem:

f : unknown disparity map

f(x,y,t) :Each frame of the video : M rows, N columnsTotal: K framesStack the entries of f(x,y,z) into a column vector of size MNK x 1

21x (M rows)y (N columns)t (K frames)

Spatio-temporal Consistency[3]l1 minimization problem:

D : forward difference operator

: parameters(constants)

22

Spatio-temporal Consistency[3]Solve :

23[1]S. H. Chan, R. Khoshabeh, K. B. Gibson, P. E. Gill, and T. Q. Nguyen, An augmented lagrangian method for total variation video restoration, in ICASSP, May 2011

Solve sub-problem : f,u,r iteratively

ExperimentalResults24Experimental Results25[14] C. Richardt, D. Orr, I. Davies, A. Criminisi, and N. A. Dodgson, Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid, ECCV, 2010.Experimental ResultsJamie1 from Microsoft i2i database

26

Experimental ResultsIlkay from Microsoft i2i database

27

Experimental ResultsTunnel

28

Experimental ResultsPerformance comparison of methods

29

The average percentage of bad pixels (threshold of 1)Experimental Results19s to compute the disparity map Can be adopted into a real-time application (by using GPU)Refinement using the TV method[3] reduces errors in the background (spatial noise and temporal inconsistencies)

30Experimental ResultsSpatio-Temporal Consistency[3]

31

Experimental ResultsSpatio-Temporal Consistency[3]

32

Experimental ResultsSpatio-Temporal Consistency[3]

33

Conclusion34ConclusionPropose an accurate local stereo matching method for video disparity estimationMotion cueTo obtain more accurate support weightModified census transformTo obtain more reliable raw matching costs in flat areasSpatio-temporal volumeImprove spatial and temporal consistencyIt presents the probability for directly extending current image-based disparity algorithms to the video domain

35