imagine - computer visionimagine.enpc.fr/~de-la-gm/cours/upem/cours_image_5.pdfmatching score is...

Computer Visionsparse and dense matching between two images

Martin de La Gorce

[email protected]

February 2015

1 / 1

A important problem

The problem of matching subparts of two different images is acore computer vision problem and has many applicationincluding

stereo reconstruction

motion estimation

tracking

medical image registration

2 / 1

An important problem

Panorama created by stiching images:

3 / 1


Depth reconstructed from disparity of matching points:

4 / 1


Object recognition by matching images:

5 / 1

Local vs Global

We can classify image matching algorithm into:

local methods that look for matching only in a limitedregion of the image

global methods that look for potential matchings throughthe entire image

6 / 1

Sparse vs Dense

We can classify image matching algorithm into:

sparse matching consists in finding reliable matches onlyfor a subset of key feature points of the imagecorresponding to salient points

dense matching consists in matching all the points in theoverlapping region of the two images on a dense grid inthe input

7 / 1

Sparse vs Dense

For efficiency, most method are either

global but sparse

dense but local

We can combine two approaches two get a dense a globalmatching method by

starting with a global and sparse method and then

densify the matching by propagation to neighboring pixels

8 / 1

The apperture problem

It is difficult to estimate the motion of a line along the linedirection if we do not see its extremities:

Sparse matching adresses the problem by consideringonly reliable points (corners)

Dense matching adresses the problem by propagating theinformation from reliable point to the surrounding regionsby imposing some regularity or rigidity of thetransformation across neighboring pixels,

9 / 1

A naive brute force patch matching

We measure the difference between two patches fromimage A and image B of size N by N respectively centeredat locations (i , j) and (m, n) using the sum of squaredifferences between corresponding pixels (SSD):

D(i , j , k , l) =r∑

m=−r

r∑

n=−r

(A(i + m, j + n) − B(k + m, l + n))2

with r = N−12 the "radius" of the patch

denoting PA(i , j) and PB(k , l) the two N by N matrix of pixelintensities corresponding to the patch extractedrespectively from A around (i , j) and B around (k , l) we get

D(i , j , k , l) = ‖PA(i , j) − B(k , l)‖2

10 / 1


For each patch from image A we look we the most similarpatch in the image B and obtain a nearest neighbor fieldM : N2 → N2

M(i , j) = min(k ,l)D(i , j , k , l)

denoting (H, W ) the size of the image this methodcomplexity is of order H2W 2N2 which is generallyprohibitive..

accelerations possible using approximations and kdtrees 1

1Computing Nearest-Neighbor Fields via Propagation-Assisted KD-Trees.Kaiming He, Jian Sun CVPR 2012

11 / 1


limitation of the brute force patch matching :

computationally expensive

do not cope with change in scale and local rotation. Wecould search across various scale and rotation angles, butthat would increase event more the computational cost

many wrong matches for edges due to the apertureproblem

many wrong matches in uniform regions

sensitive to viewing direction and pose

sensitive to conditions of illumination

12 / 1


Solutions

We can get robustness to illumination changes using betterpatch similarity measures

We can partially address the computational cost problemby extracting all patches in image B and put them in astructure that allows fast nearest neighbor search (kd-tree,FLANN).

We can address the aperture problem using only a subsetof the patch that are likely to have reliable matches using acorner detector. We sacrifice density but gain robustnessand speed

13 / 1

normalized correlation

I order to be robust to change in contrast we can use thenormalized cross-correlation between the two patches Letdenote X and Y the two patches of size N by N

NCC(X , Y ) =

∑i∑

j(X [i , j] − μX)(Y [i , j] − μY )

‖X − μX‖ × ‖Y − μY‖(1)

With μX and μY the mean intensity in X and YμX = 1

N2

∑ij X [i , j],μY = 1

N2

∑ij Y [i , j] and

‖X − μX‖ =

√∑

m

∑

n

(X [i , j] − μX )2 (2)

‖Y − μY‖ =

√∑

m=

∑

n

(Y [i , j] − μY )2 (3)

14 / 1

normalized correlation

We have a list of patches X1, . . . , Xm than we want to compareto patche Y1, . . . , Yn. In order to reduce computation we firstcompute centered patches X1, . . . , Xm and Y1, . . . , Yn :

Xk [i , j] = Xij − μX

Yl [i , j] = Yij − μY

then centred and normalized patches

Xk [i , j] = Xk [i , j]/‖Xk‖

Yk [i , j] = Yk [i , j]/‖Yk‖

we then get

NCC(Xk , Yl) =∑

ij

Xk [i , j]Yl [i , j]

15 / 1

Corner detection

A patch is unlikely to be reliably matched if it looks like itssurrounding patches.

flat region:no change

edge: changein one direction

corner: changein both direction

16 / 1

Corner detection

A patch is unlikely to be reliably matched if it looks like itssurrounding patches.

The approach proposed by Moravec tests each pixel in theimage to see if a corner is present, by testing whether apatch centered on the pixel is similar to nearby, largelyoverlapping patches.

The similarity is measured by taking the sum of squareddifferences (SSD) between the two patches. A lowernumber indicates more similarity.

17 / 1

Corner detection

Given an image I and a pixel location (i , j) we compute asurface that measure the self-similarity for small displacements(u, v) :

Eij(u, v) =∑

x

∑

y

w(x − i , y − j)(I(x + u, x + v) − I(x , y))2

with w a binary window function w(x , y) = [(max(|x |, |y |) < c]

image Eij(u, v)

18 / 1

Moravec’s Corner detection

We measure the dissimilarity of a patch with its 8neighbors using the score

Sij = min(u,v)∈T Eij(u, v)

with T four tested shifts T = {(1, 0), (1, 1), (0, 1), (−1, 1)}

All patches with a high E score in either image A or imageB are discarded in the matching as they would probablymatch several patches in the other image

19 / 1

Moravec’s Corner detection

This approach from Moravec as several limitations

Noisy response due to a binary window function

Only a set of shifts at every 45 degree is considered

20 / 1

Harris Corners detector

To adress these limitations Harris proposed to

use a gaussian window function to get less noisyresponses

w(x , y) = exp(

−(x2 + y2)

2σ2

)

consider all small shift using taylor’s expension

Eij(u, v) =∑

x ,y

w(x − i , y − j)(I(x + u, y + v) − I(x , y))2

'∑

x ,y

w(x − i , y − j)(Ixu + Iyv)2

With Ix the derivative of the image in the x direction and Iythe derivative in the y direction

21 / 1


Eij(u, v) '∑

x ,y

w(x − i , y − j)(Ixu + Iyv)2 (4)

'∑

x ,y

w(x − i , y − j)(I2x u2 + I2

y v2 + 2Ix Iyuv) (5)

We can rewrite this in a matrix form

Eij(u, v) ' [u, v ]Mij

[uv

]

With

Mij =∑

x ,y

w(x − i , y − j)[

I2x Ix Iy

Ix Iy I2y

]

22 / 1


Eij(u, v) ' [u, v ]Mij

[uv

]

Eij(u, v) is quadratic positive and the level set{(u, v)|Eij(u, v) = 1} is an ellipsoid with

main axis direction aligned with the eigen vectors of Mij

main axis lengths proportional to the inverse of the twoeigen values of Mij

23 / 1

Harris

We want Eij(u, v) to be large over the entire unit circleC = {(u, v)|u2 + v2 = 1}

Using the taylor expension approximation we have

min(u,v)∈CEij(u, v) ' min(u,v)∈C

(

[u, v ]Mij

[uv

])

= λmin

with λmin the smallest eigen value of the matrix Mij .

We can compute Mij ’s eigen values and use a threshod κby keeping point with λmin > κ to get the Kanade-Tomasicorner detector.

24 / 1

Harris

Computing the eigen values requires a square root.instead Harris proposed to use a corner score R defined by

Rij = det(Mij) − k(trace(Mij))2

with k ' 0.06Denoting λ1 and λ2 the two eigen values we getR = λ1λ2 − k(λ1 + λ2)

2

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.000

0.0

50

0.1

00

0.1500.200

0.250

0.300

0.350

0.400

0.450

0.500

0.550

0.600

0.650

0.700

25 / 1

Harris

Classification of pixel given the two eigen values of Mij :

26 / 1

Harris

input images:

27 / 1

Harris

harris scores image with value Rij at location (i , j)

28 / 1

Harris

thresholded harris scores Rij > τ at location (i , j)

29 / 1

Harris

harris scores local maximums

30 / 1

Harris

detected corners:

31 / 1

Harris

ccwithout filtering with filtering

32 / 1

Harris

Once we found a set of corner in both image A and B wecompare each pair of point using the sum of squaredifferences or the normalized cross- correlation

Given a point in A we keep a matching point in B only if thematching score is much better that than the score of thesecond best matching patch in B. we keep matching(i , j) → (k , l) only if

∀(m, n) 6= (k , l) : D(i , j , k , l) < 0.8 ∗ D(i , j , m, n)

33 / 1

Harris: filtering matches

34 / 1

Multiscale

In many situations, detecting features at the finest scalepossible may not be appropriate. For example, when matchingimages with little high frequency detail (e.g., clouds), fine-scalefeatures may not exist. One solution to the problem is to extractfeatures at a variety of scales, e.g., by performing the sameoperations at multiple resolutions in a pyramid and thenmatching features at the same level. This kind of approach issuitable when the images being matched do not undergo largescale changes,

35 / 1

Descriptors

If we relax the assumption that we have only a translationby adding change in scale and rotation the search spacefor patch matching is big

instead of using the row patch pixel intensities we create alocal descriptor that are scale invariant and rotationinvariant

36 / 1

Rotation invariance

We compute the histogram of the gradient directions

keep all directions within 80% of dominant one

when comparing two patches we test only the set ofrotation that align these main directions.

37 / 1

Scale invariance

we can

Instead of Harris corner detector we need a salient pointdetector that works at various scale and prov

keep all directions within 80% of dominant one

when comparing two patches we test only the set ofrotation that align these main directions.

38 / 1

dense matching

Suppose we have two A et B we look for the displacement fieldu : R2 7→ R2 u(x , y) = (ux(x , y), ut(x , y)) such that:

Forward formulation:

A(x , y) = B(x + ux(x , y), y + uy (x , y))

Backward formulation:

B(x , y) = A(x − ux(x , y), y − uy (x , y))

image ux ,uy

39 / 1

dense matching as a minimization

Because of the noise and the discretization there is nosolution, instead we minimize the cost function

E(u) =

∫

Ω‖B(x , y) − A(x − ux(x , y), y − uy (x , y))‖2dxdy

(6)

The problem is ill posed : because of the appertureproblem there are many equally good solutions

We need either to parameterize u (for example consideringonly rigid transformation) or to impose some smoothnesson u

40 / 1

dense matching

Adding a smoothness term we minimize

E(u) =

∫

Ω‖B(x , y) − A(x − ux(x , y), y − uy (x , y)‖2dxdy

+

∫

Ω‖∇ux‖

2 + ‖∇ux‖2dxdy

(7)

the smoothness term will diffuse the motion informationfrom the corner to the edges and uniform regions

We will minimize this cost function with respect to u using agauss-newton approach

41 / 1

1D case

For simplicity we do the derivation of the method in 1D in adiscret setting (A and B continuous and u discretized into avector)

E(u) = Edata(u) + Esmooth(u) (8)

with

Edata(u) =∑

x

‖B(x) − A(x − u(x))‖2 (9)

Esmooth(u) =∑

x

(u(x + 1) − u(x))2 (10)

42 / 1

1D case

Edata(u) =∑

x

‖B(x) − A(x − u(x))‖2

Following the gauss newton method, we obtain a linear leastsquare approximation of Edata(u) around an current estimate ut

by linearising A(x − u(x)) around ut(x) :

A(x−u(x)) ' A(x−ut(x))−

(∂A∂x

)

(x−ut (x))

×(u(x)−ut(x)) (11)

Edata(u) '∑

x

‖B(x)−A(x−ut(x))−

(∂A∂x

)

(x−ut (x))

×(u(x)−ut(x))‖2dx

Edata(u) ' ‖rt − Mt(u − ut)‖2

with Mt the diagonal matrix Mt(x , x) = −(

∂A∂x

)(x−ut (x))

and rt thevector of residuals with rt(x) = B(x) − A(x − ut(x))

43 / 1

1D case

Esmooth(u) rewrites Esmooth(u) = ‖Du‖2 with D the followingToeplitz matrix of size n − 1 × n :

D =

−1 1 0 ∙ ∙ ∙ 0

0. . . . . . . . .

......

. . . . . . . . . 00 ∙ ∙ ∙ 0 −1 1

matrix We get

E(u) ' ‖Mt(u − ut) − rt‖2 + ‖Du‖2

At each iteration the new displacement field ut+1 is estimatedas the minimum of this least square problem:

ut+1 = (DT D + MTt Mt)

−1MTt (Mtut + rt)

We iterate until convergence44 / 1

2D case

In the 2D case , denoting uxv , uuv ,uxvt , uuvt the vectors obtainedfrom ux , uy ,uxt and uyt taking element in the row-major orderwe get

Edata(u) ' ‖rt − Mxt(uxv − uxvt) − Myt(uyv − uyvt)‖2

with

Mxt the diagonal matrix whose diagonal corresponds to thederivatives −∂A

∂x (x − uxt(x , y), y − uyt(x , y)) in therow-major order

Myt the diagonal matrix whose diagonal corresponds to thederivatives −∂A

∂y (x − uxt(x , y), y − uyt(x , y)) in therow-major order

rt the vector of residuals containing the residualsB(x , y) − A(x − uxt(x , y), y − uyt(x , y)) in the row-majororder

45 / 1

2D case

In the 2D case we get

Esmooth(u) =∑

xy

(ux(x + 1, y) − ux(x , u))2

+∑

xy

(uy (x + 1, y) − uy (x , u))2

+∑

xy

(ux(x , y + 1) − uy (x , u))2

+∑

xy

(uy (x , y + 1) − uy (x , u))2

Denoting uxv and uuv the to vectors obtained from ux and uy

taking element in the row major order we define sparsematrices Dx and Dy to getEsmooth(u) = ‖Dxuxv‖2 + ‖Dyuxv‖2 + ‖Dxuyv‖2 + ‖Dyuyv‖2

46 / 1

2D case

We need to take

Dx = IH−1,H ⊗ DW

Dy = DH ⊗ IW−1,W

with (H, W ) the size of the image, ⊗ the kronecker product oftwo matrices , Ikl the identity matrix of size k × l and Dk thebi-diagonal matrix of size k − 1× k with Dii = −1 and Di,j+1 = 1

47 / 1

2D case

E(u) = Edata(u) + Esmooth(u)

Edata(u) ' ‖rt − Mxt(uxv − uxtv ) − Myt(uyv − uytv )‖2

Esmooth(u) = ‖Dxuxv‖2 + ‖Dyuxv‖

2 + ‖Dxuyv‖2 + ‖Dyuyv‖

2

We can minimize alternatively with respect to uxv and Uyv :

for t odd:uxvt+1 = (DT

x Dx + DTy Dy + MT

xtMxt)−1MT

xt(Mxtuxt + rt)

for t even:uxvt+1 = (DT

x Dx + DTy Dy + MT

ytMyt)−1MT

yt(Mytuyt + rt)

48 / 1

2D case

We can use a single iteration to update both ux and uy using

Edata(u) ' ‖rt − Mt(uv − utv )‖2

Esmooth(u) = ‖Dx2uv‖2 + ‖Dy2uv‖

2

with uv =

[uvx

uvy

]

,Mt = [Mxt , Myt ], Dx2 = I2 ⊗ Dx ,Dy2 = I2 ⊗ Dy

We can minimize with respect to uv using

uvt+1 = (DTx2Dx2 + DT

y2Dy2 + MTt Mt)

−1MTt (Mtut + rt)

49 / 1

Optical Flow

A well know method to estimate a displacement field forsmall inter-frame displacement in a video sequence ifcalled the optical flow

A particular form of optical flow computation introduced byHorn and Schunck corresponds to a single iteration of thegauss newton method we just presented where the linearsystem is solved using the jacobi method

Instead of deriving the optical flow the classic way , I choseto present a gauss-newton optimization interpretation ofthis problem that i believe allows a more generalunderstanding of the method and its possible extensions(TV regulariation to allow discontinuities, iterativerefinement etc)

50 / 1

stereo

a special case of displacement field estimation is thestereo reconstuction problem

from a sparse set of correspondances between the twoimages it is possible to find a pair of homographictransformations to deforme each of the two image suchthat pairs of corresponding point are on the samehorizontal line of pixels this is called the rectification

51 / 1

stereo

After rectification the problem consist then in estimating apure horizontal displacement field (ux in our previousformulation) refered as the disparity map

Efficient dense and global method base on graph cut orother discrete optimization methods exist to estimate thedisparity map

52 / 1

Links and references

Revisiting Horn and Schunck: Interpretation asGauss-Newton Optimisation http://ar.in.tum.de/pub/

zikic2010revisiting/zikic2010revisiting.pdf

Horn-Schunck Optical Flow with a Multi-Scale Strategy(with demo)http://www.ipol.im/pub/art/2013/20/

Darko Zikic, Ali Kamen, and Nassir Navab Revisiting Hornand Schunck: Interpretation as Gauss-NewtonOptimisation. BMVC 2010.

53 / 1

imagine - computer visionimagine.enpc.fr/~de-la-gm/cours/upem/cours_image_5.pdfmatching score is...

Documents