stereo matching, part 11 - university of aucklandrklette/ccv-cimat/pdfs/b18...stereo matchinggeneric...

Stereo Matching Generic Model Data Cost Matrix Data-Cost Functions Global to Local Testing Data Cost Functions

Stereo Matching, Part 11

Lecture 18

See Section 8.1 inReinhard Klette: Concise Computer Vision

Springer-Verlag, London, 2014

1See last slide for copyright information.1 / 42


How to Detect q?

p pq

2 / 42


Agenda

1 Stereo Matching

2 Generic Model

3 3D Data Cost Matrix

4 Data-Cost Functions

5 Global to Local

6 Testing Data Cost Functions

3 / 42


Outline

We considered

1 two-camera systems

2 their calibration and geometric rectification

3 stereo-vision geometry

4 the use of detected corresponding pixels for 3D surfacereconstruction, assuming a stereo pair in canonical stereogeometry

What remains is to answer the question:

How to detect pairs of corresponding pixels in a stereo pair?

We assume stereo pairs which are geometrically rectified forstandard stereo geometry

4 / 42


Goal: Confident Results

Visualization of disparity map using a color keyblack = low confidence

5 / 42


Stereo Matching is an Example for Labeling

Labeling function f assigns a label fp ∈ L, a disparity, to each pixellocation p ∈ Ω

Applies error functions Edata and Esmooth

E (f ) =∑p∈Ω

Edata(p, fp) +∑

q∈A(p)

Esmooth(fp, fq)

Accumulated costs of label fp at p are data plus adjacentsmoothness error values

Adata(p, fp) = Edata(p, fp) +∑

q∈A(p)

Esmooth(fp, fq)

6 / 42


Agenda

1 Stereo Matching

2 Generic Model



5 Global to Local


7 / 42


Generic Model for Matching

Left and right image are denoted by L and R

One is the base image, and the other one the match image,denoted by B and M

For pixel (x , y ,B(x , y)) we search corresponding pixel(x + d , y ,M(x + d , y))

Both on the same epipolar line identified by row y

Two pixels are corresponding if they are projections of the same3D point P = (X ,Y ,Z )

8 / 42


Search Intervals

pqp

Wp Wq Wp

B M

Case B = L, thus M = RInitiate a search by p = (x , y) in BSearch interval for q = (x + d , y) in M withmaxx − dmax, 1 ≤ x + d ≤ x , or 0 ≤ −d ≤ mindmax, x − 1

Example: p = (1, y) in B then only d = 0 and P “at infinity”

p = (Ncols , y) and Ncols > dmax then −d ≤ dmax and searchinterval stops at Ncols − dmax

9 / 42


B = R

If B = R and M = L, sign of d will swap from negative to positive

0 ≤ d ≤ mindmax,Ncols − x

Initiate search by p = (x , y) in B = R, search q = (x + d , y) inM = L with x ≤ x + d ≤ minNcols , x + dmax

BM

pq

Wq Wp

p

Wp

10 / 42


Label Set for Stereo Matching

Right-to-left matching then

L = −1, 0, 1, . . . , dmax

where −1 denotes “no disparity assigned”

Left-to-right matching then

L = +1, 0,−1, . . . ,−dmax

Using intrinsic and extrinsic camera parameters, value dmax can beestimated by closest objects in scene “still of interest”

Example: analyzing a scene in front of a driving car, everythingcloser than, say, 2 m is out of interest, and the threshold 2 midentifies dmax

11 / 42


Conventions for Correspondence Search

Straightforward idea: compare neighborhoods (rectangularwindows, for simplicity) around pixels p and q

Windows W l ,kp (I ) of size (2l + 1)× (2k + 1)

Window W 0,kp (I ) is only along one image row.

Only consider gray-level images with values between 0 and Gmax

Always one image row y at a time, thus

just Bx for B(x , y), or W l ,kx (B) for W l ,k

p (B), and Mx+d for

M(x + d , y), or W l ,kx+d(M) for W l ,k

q (M)

Pixel location x in B and pixel location x + d in M

12 / 42


SSD, SAD, AD Data Cost Functions

Data in both windows around p = (x , y) and aroundq = (x + d , y) are identical iff

ESSD(p, d) =l∑

i=−l

k∑j=−k

[B(x + i , y + j)−M(x + d + i , y + j)]2

results in value 0

SSD stands for sum of squared differences

Sum of absolute differences (SAD)

ESAD(p, d) =l∑

i=−l

k∑j=−k

|B(x + i , y + j)−M(x + d + i , y + j)|

Extreme case l = k = 0 defines absolute difference (AD)

EAD(p, d) = |B(x , y)−M(x + d , y)|13 / 42


Five Reasons Why Just SSD or SAD Will Not Work

1 Invalidity of ICA. Intensity values at corresponding pixels, andin their neighborhoods, typically impacted by lightingvariations, or just by image noise

2 Local reflectance differences. Due to different viewing angles,P and its neighborhood reflect light differently to camerasrecording B and M

3 Differences in cameras. Different gains or offsets in camerasused result in high SAD or SSD errors

4 Perspective distortion. 3D point P = (X ,Y ,Z ) is on a slopedsurface; local neighborhood around P on this surface isdifferently projected into images B and M

5 No unique minimum. There might be several pixel locations qdefining the same minimum

14 / 42


Agenda

1 Stereo Matching

2 Generic Model



5 Global to Local


15 / 42


3D Data Cost Matrix

Data cost measure Edata(p, l), with p ∈ Ω and l ∈ L \ −1,defines data cost matrix Cdata of size Ncols × Nrows × (dmax + 1),with elements (assuming that B = R)

Cdata(x , y , d) =

Edata(x , y , d), if 0 ≤ d ≤ mindmax,Ncols − x

−1 otherwise

Example: SSD and SAD define CSSD and CSAD

16 / 42


Layers of Matrix

0 1 2 3 dmax

x

y

x

y

x

d

-1

Constraineddisparity range

Left: A position contains either value Edata(x , y , d) or −1Right: Top view assuming Ncols > dmax and B = R; columns (iny -direction) in dark gray have value −1; rows in light gray areconstrained 17 / 42


Interpretation of Figure

3D data-cost matrix is a comprehensive representation of all theinvolved data costs, and it can be used for further optimization(e.g. by using the smoothness-cost function) in the controlstructure of the stereo matcher

Each disparity defines one layer of the 3D matrix

In case of B = R, columns indicated by dark gray do have value−1 in all of their positions

Rows in light gray indicate cases of x-coordinates where theavailable range of disparities is constrained, not allowing to go to amaximum of dmax

Notation: C (x , d) excludes current row y or cost function

18 / 42


Agenda

1 Stereo Matching

2 Generic Model



5 Global to Local


19 / 42


Data-Cost Functions

Stereo matcher defined by used data and smoothness-cost terms,and by a control structure how those terms are applied forminimizing the total error of the calculated labeling function f

Smoothness terms are generically defined

We present control structures later

Now: Data-cost calculation as the “core component” of a stereomatcher

20 / 42


Zero-Mean Version

Calculate mean Bx of a used window W l ,kx (B), and mean Mx+d of

window W l ,kx+d(M), subtract Bx from all values in W l ,k

x (B), and

Mx+d from all values in W l ,kx+d(M), calculate this way the

data-cost function in its zero-mean version

An option for reducing impact of lighting artifacts (i.e. notdepending on ICA)

Indicated by starting subscript of data-cost function with a Z

Example: EZSSD or EZSAD are zero-mean SSD or zero-mean SADdata-cost functions

EZSSD(x , d) =l∑

i=−l

k∑j=−k

[(Bx+i ,y+j − Bx)− (Mx+i+d ,y+j −Mx+d)

]2EZSAD(x , d) =

l∑i=−l

k∑j=−k

∣∣[Bx+i ,y+j − Bx ]− [Mx+d+i ,y+j −M i+d ]∣∣

21 / 42


NCC Data Cost

Normalized cross correlation (NCC) already used for comparingtwo images

Already defined by zero-mean normalization, but we add Z to theindex for uniformity in notation; let EZNCC (x , d) =

1−∑l

i=−l∑k

j=−k[Bx+i ,y+j − Bx

] [Mx+d+i ,y+j −Mx+d

]√σ2B,x · σ2

M,x+d

where

σ2B,x =

l∑i=−l

k∑j=−k

[Bx+i ,y+j − Bx

]2σ2M,x+d =

l∑i=−l

k∑j=−k

[Mx+d+i ,y+j −Mx+d

]222 / 42


Census Data-Cost Function

The zero-mean normalized census cost function

EZCEN(x , d) =l∑

i=−l

k∑j=−k

ρ(x + i , y + j , d)

with

ρ(u, v , d) =

0 Buv ⊥ Bx and Mu+d ,v ⊥ Mx+d

1 otherwise

where ⊥ either < or >

By using Bx instead of Bx , and Mx+d instead of Mx+d , we havethe census data-cost function ECEN

23 / 42


Example for Census Data Cost

Windows Wx(B) and Wx+d(M)

2 1 61 2 42 1 3

5 5 97 6 75 4 6

Have Bx ≈ 2.44 and Mx+d ≈ 6.11

i = j = −1 results in u = x − 1 and v = y − 1Bx−1,y−1 = 2 < 2.44 and Mx−1+d ,y−1 = 5 < 6.11Thus ρ(x − 1, y − 1, d) = 0

i = j = +1 results in u = x + 1 and v = y + 1Bx+1,y+1 = 3 > 2.44 but Mx+1+d ,y+1 = 6 < 6.11Thus ρ(x + 1, y + 1, d) = 1

i = j = −1: values in the same relation with respect to the meani = j = +1: opposite relationships

24 / 42


Result for Example

For the given example: EZCEN = 2

Spatial distribution of ρ-values

0 0 01 0 00 0 1

Vector cx ,d lists these ρ-values in a left-to-right, top-to-bottomorder:

[0, 0, 0, 1, 0, 0, 0, 0, 1]>

25 / 42


Hamming Distance

Let bx be the vector listing results sgn(Bx+i ,y+j − Bx) in aleft-to-right, top-to-bottom order, where sgn is the signumfunction

Similarly, mx+d lists values sgn(Mx+i+d ,y+j −Mx+d)

For the values in previous example

bx = [−1,−1,+1,−1,−1,+1,−1,−1,+1]>

mx+d = [−1,−1,+1,+1,−1,+1,−1,−1,−1]

cx ,d = [ 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 ]>

Vector cx ,d shows positions where bx and mx+d differ; the numberof positions where two vectors differ is known as Hamming distance

26 / 42


Efficient Calculation

Observation The zero-mean normalized census data costEZCEN(x , d) equals the Hamming distance between vectors bx andmx+d

By replacing values “-1” by “0” in vectors bx and mx+d , Hammingdistance for resulting binary vectors can be calculated verytime-efficiently

27 / 42


Agenda

1 Stereo Matching

2 Generic Model



5 Global to Local


28 / 42


From Global to Local Matching

Data-cost and smoothness-cost function together define theminimization problem for the total error

The smoothness term uses an adjacency set A(p), typicallyspecified by 4-adjacency or an even larger adjacency set

“Growing” 4-adjacency defines a global dependency

pqq

qq

rq

q

r

r

qr

rqr

rrr

Dependency from 4-adjacent pixels; dependency of 4-adjacentpixels from their 4-adjacent pixels; etc.

29 / 42


Growing Dependencies

Labels at pixels q also depend, according to the smoothnessconstraint, on labels at all the 4-adjacent pixels r of those pixels q

Labels at pixels r depend now again on labels at all 4-adjacentpixels of those pixels r

By continuing the process, the label at pixel p depends on labels atall the pixels in carrier Ω

If A is a larger set than 4-adjacency, then data dependencies coversall Ω even faster

Data term defines local value dependency

Smoothness term defines global dependency (when usedrepeatedly)

30 / 42


Global Matching and Its Time Complexity.

Global matching (GM) is too costly

Example 1: extremely large smoothness penalties, e.g.Esmooth(fp, fq) = Gmax · Ncols · Nrows whenever fp 6= fq; then a(common) data term does not matter anymore

Any constant disparity map for all pixels in B solves thisoptimization problem

Example 2: Smoothness penalties are insignificant; decisions canbe based on data cost only

“The Winner Takes It All” has a time-efficient and accuratesolution to the minimization problem

Observation. Data term, adjacency set, and smoothness terminfluence the time needed to solve the minimization problem

31 / 42


Time Estimate for GM

B and M of size Ncols × Nrows

GM: each pixel in B needs to communicate with any other pixel inB, say via 4-adjacency

Longest path of length Ncols + Nrows between pixels in corners of Ω

Communication paths have length (Ncols + Nrows)/2 in average

Each of the Ncols · Nrows pixels needs to communicate with theother Ncols · Nrows − 1 pixels

Asymptotic run time for evaluating globally one labeling f

tone = O((Ncols + Nrows)N2

cols · N2rows

)Set of possible labelings has cardinality

call = |L||Ω| ∈ O(

dNcols ·Nrowsmax

)Exhaustive testing of all labelings would require time of call · tone

32 / 42


Reduced Search Space

p p

Search space of 16 rays running from image border to p

Search space defined by including eight times repeatedly also4-adjacent pixels into the decisions

33 / 42


Area of Influence

When deciding about a label fp at p then control structure ofstereo matcher consults pixels in set p + S via the smoothnessconstraint

Set S is the area of influence

Example 1: defined by the intersection of digital rays with ΩDefines an area of influence for (ray-based) semi-global matching(SGM)

Example 2: defined by repeatedly expanding into 4-adjacent pixelsaround the previous area of influenceNumber of expansions defines the radius of the created 4-discDefines an area of influence as used in belief-propagation matching(BPM)

34 / 42


Local Versus Semi-Global Matcher

Input image, disparity map for a local matcher, and when usingiSGM, a semi-global stereo matcher

35 / 42


Agenda

1 Stereo Matching

2 Generic Model



5 Global to Local


36 / 42


Testing Data Cost Functions

Testing the accuracy of different data cost functions with respectto identifying the correct corresponding pixel in M

We need to have stereo pairs with ground truth

Apply a stereo matcher which is not impacted by a designedcontrol structure, and not by a chosen smoothness term

The Winner Takes It All

Reduce minimization of total error to minimization of

E (f ) =∑p∈Ω

Edata(p, fp)

37 / 42


Use of 3D Data Cost Matrix

Generate 3D data cost matrix Cdata

Select pixel location p = (x , y), go through all layers d = 0, d = 1,to d = dmax, and compare all values Cdata(x , y , d) ≥ 0 for findingthe disparity

fp = argmin0≤d≤dmaxCdata(x , y , d) ≥ 0

which defines the minimum

38 / 42


The Winner Takes It All

1: Let d = dmin = 0 and Emin = Edata(p, 0) ;2: while d < x do3: Let d = d + 1 ;4: Compute E = Edata(p, d);5: if E < Emin then6: Emin = E and dmin = d ;7: end if8: end while

Applying left-to-right matching

This stereo matcher will create depth artifacts

39 / 42


Ground Truth for Measuring Accuracy of Stereo Matching

Assume that a measurement method provided (fairly) accuratedisparity values at pixel locations, for a set or sequence of stereopairs B and M

Might be defined by synthetic stereo pairs generated for a 3Dworld model and the use of a 3D rendering program for calculatingthe stereo pairs

Original image from EISATS and ground truth disparity map40 / 42


Bad Pixel

Location p ∈ Ω defines a bad pixel in B iff the disparity calculatedat p differs by (say) more than 1 from the value provided asground truth disparity

Percentage of non-bad pixels is an accuracy measure for stereomatching

Experiments can then be as follows:

Select your set of test data where ground truth disparities areavailable (e.g. borrow a LIDAR for generating), or simulate bygenerating synthetic data as on slide before

Select cost data functions and parameters you are interested in forcomparison

Do a statistical analysis of the percentage of bad pixels whenapplying “The winner takes all”

41 / 42


Copyright Information

This slide show was prepared by Reinhard Klettewith kind permission from Springer Science+Business Media B.V.

The slide show can be used freely for presentations.However, all the material is copyrighted.

R. Klette. Concise Computer Vision.c©Springer-Verlag, London, 2014.

In case of citation: just cite the book, that’s fine.

42 / 42

http://www.cs.auckland.ac.nz/~rklette

stereo matching, part 11 - university of aucklandrklette/ccv-cimat/pdfs/b18...stereo matchinggeneric...

Documents