filtering images without normalizationmilanfar/publications/...filtering images without...

Filtering Images Without Normalization

Peyman MilanfarGoogle Research

[email protected]

Abstract

When applying a filter to an image, it often makes practi-cal sense to maintain the local brightness level from input tooutput image. This is achieved by normalizing the filer co-efficients so that they sum to one. This concept is generallytaken for granted, but is particularly important where non-linear filters such as the bilateral or and non-local meansare concerned, where the effect on local brightness andcontrast can be complex. Here we present a method forachieving the same level of control over the local filter be-havior without the need for this normalization. Namely, weshow how to closely approximate the any normalized filterwithout in fact needing this normalization step. We derivea closed-form expression for the approximating filter andanalyze its behavior, showing it to be easily controlled forquality and nearness to the exact filter, with a single param-eter.

1. Introduction and Background

Edge-aware filters are constructed using kernels that arecomputed from the given image. The adaptation of the fil-ters to local variations in the image is what endows themwith the power and flexibility to treat different parts of theimage differently. This adaptability, however, can not be ar-bitrary. In particular, the local brightness of the image mustoften be maintained in order to yield a reasonable global ap-pearance. The standard way to this achieved is by normaliz-ing the filter coefficients pointing to each pixel, so that theysum to 1. In this paper we propose a new and different way.Namely, we present a general method to approximate anynormalized filter with one that does not require normaliza-tion. This produces a rather simpler filter structure, but withessentially the same functionality.

Approximation ideas centered around nonlinear filtersare not new. In particular, the bilateral filter has been subjectto various interesting algorithmic attacks [5] which have re-sulted in significantly improved computational complexitywith almost no loss in quality. Current state of the art al-

gorithms are able to do bilateral filtering efficiently enoughthat cost is roughly proportional to image resolution, espe-cially when the support of the filter kernel is very large1)[1, 2]. Very recently, the idea of using an un-normalizedform of the bilateral filter specifically was used in [3] toproduce fast local Laplacian filters. As will become clear,[3] is a special case of the broader concept we propose here.

What we propose here is not another approximation tothe bilateral filter. Our treatment works equally well forany normalized filter that has a well-defined kernel; bilat-eral, non-local means, etc. being just a few popular ex-amples. Our filter avoids local normalization of the filtercoefficients, while remaining close in its effect to the basefilter. This approximation has several interesting properties.First, the fidelity of the approximation is guaranteed sinceit is derived from an optimality criterion; furthermore, thisfidelity can be controlled easily with a single paramater re-gardless of the form of the base filter (e.g. bilateral, non-local mean, etc.) Second, the approximate filter is guaran-teed to maintain the average gray level just as the base filterwould, regardless of tuning. Finally, the approximate filteris easy to analyze and provides intuitively pleasing structurefor understanding the behavior of general image-dependentfilters. By way of practical motivation, the approximationallows us to start with an arbitrary (normalized) base fil-ter and generate a one-parameter family of simpler nearbyfilters, which can locally modulate the effect of the basefilter. This is different, and more flexible, than the typi-cal approach where the base filters (bilateral, NLM, etc) arecontrolled with global smoothing parameters. For variousapplications such as texture-cartoon decomposition, guidedfiltering, and local (e.g. Laplacian) tone mapping, and evennoise suppression, the additional flexibility afforded can bevery useful.

Before moving forward, we establish our notation. Con-sider an image Y of size

√n ×√n as the input, and the

image Z as the output of the filtering process. We will scanthese images into vectors with, say, a column-stacking or-

1These approaches however, do not generalize easily to kernels beyondthe bilateral.

1

der and denote these images respectively as n× 1 vectors yand z.

The general construction of a filter begins by specifyinga symmetric positive semi-definite (SPD) kernel kij ≥ 0that measures the similarity, or affinity, between individualor groups of pixels. This affinity can be measured as a func-tion of both the distance between the spatial variables (de-noted by x), but more importantly, also using the gray orcolor value (denoted by y). While the results of this paperextend to any filter with an SPD kernel, some popular ex-amples commonly used in the image processing, computervision, and graphics literature are as follows:

Gaussian Filters [11, 7] Measuring only the Euclidean(spatial) distance between pixels, the classical Gaussian ker-nel is

kij = exp

(−‖xi − xj‖2

h2x

).

These kernels lead to the classical and well-worn Gaussianfilters (including shift-varying versions).

Bilateral (BL) [10, 6] and Non-local Mean (NLM) [4, 8]These filters take into account both the spatial and value dis-tances between two pixels, generally in a separable fashion.For BL we have:

kij = exp

(−‖xi − xj‖2

h2x

)exp

(−(yi − yj)2

h2y

)(1)

As seen in the overall exponent, the similarity metric hereis a weighted Euclidean distance between the concatenatedvectors (xi, yi) and (xj , yj).

The NLM kernel is a generalization of the bilateral ker-nel in which the value distance term (1) is measured patch-wise instead of point-wise:

kij = exp

(−‖xi − xj‖2

h2x

)exp

(−‖yi − yj‖2

h2y

), (2)

where yi and yj refer now to subsets of samples (i.e.patches) in y.

These affinities are not used directly to filter the images,but instead in order to maintain the local average brightness,they are normalized so that the resulting weights pointing toeach pixel sum to one. More specifically,

wij =kij∑nj=1 kij

, (3)

where each element of the filtered signal z is then given by

zi =

n∑j=1

wij yj .

It is worth noting that the denominator in (3) can be com-puted by simply applying the filter (without normalization)to an image of all 1’s.

In matrix notation, the collection of the weights used toproduce the i-th output pixel is the vector [wi1, · · · , win];and this can in turn be placed as the i-th row of a filter matrixW so that

z = Wy.

We note again that due to the normalization of the weights,the rows of the matrix W sum to one, That is, for each1 ≤ i ≤ n,

n∑j=1

wij = 1.

Viewed another way, the filter matrix W is a normalizedversion of the symmetric positive definite affinity matrix Kconstructed from the unnormalized affinities kij , 1 ≤ i, j,≤n. As a result, W can be written as a product of two matri-ces

W = D−1K, (4)

where D is a diagonal matrix with diagonal elements[D]ii =

∑nj=1 kij = di. To avoid the normalization, we

will replace the filter W with an approximation W that onlyinvolves D rather than its inverse. More specifically,

W = I+ α(K−D). (5)

But why is this a good idea? In what follows, we will mo-tivate and derive this approximation from first principles,while also providing an analytically sound and numericallytractable choice for the scalar α > 0 that gives the best ap-proximation to W in the least-squares sense. Before doingso, it is worth noting some of the key propeties and advan-tages of this approximate filter which are evident from theabove expression (5).

• Regardless of the value of α, the rows of W alwayssum to one. That is, like its counterpart W constructedwith D−1, the approximation W, constructed withonly D, is automatically normalized. This can be eas-ily seen by multiplying W on the right by a vector ofones, and observing that it returns the same vector backregardless of α.

• While the filter W is not symmetric due to the mul-tiplicative normalization (see Eq. 4), the approximantW is always symmetric, again regardless of α. The ad-vantages of having a symmetric filter matrix are many,as documented in the recent work [9].

• The SPD affinity matrix K is typically also non-negative valued, leading to filter weights in W whichare also in turn non-negative valued. The elements in

2

W however, can be negative valued due to the termK − D. This means that the behavior of the approx-imate filter may differ from its reference value, andmust be carefully studied and controlled. We will dothis in this paper.

2. The Normalization-free Filter W

To derive the approximation promised in the previoussection, we first note that the standard filter can be writtenas:

W = I+D−1(K−D) (6)

Comparing this form to the one presented earlier in (5),we note that the approximation is replacing the matrix in-verse (on the right hand side) with a scalar multiple of theidentity:

D−1 ≈ αI

As an illustration, an image containing the normalizationterms dj (which comprise the diagonal elements of D) forthe photo in Fig. 4, are shown in Fig. 2. The proposal, as weelaborate below, is to replace these normalization constantsin (6) with a constant.

The motivation for this approximation is a Taylor seriesin terms of D for the filter matrix. In particular, let’s con-sider the first few terms in the series around a nominal D0:

D−1K ≈ I+D−10 (K−D)−D−2

0 (D−D0)(K−D) (7)

The series expressses the filter as a perturbation of theidentity, where the second and third terms are linear andquadratic in D. For simplicity, we can elect to retain onlythe linear term, arriving at the approximation

D−1K ≈ I+D−10 (K−D). (8)

Letting D0 = α−1I, we arrive at the suggested approxima-tion in (5).

2.1. Choosing the best α

A direct approach to optimizing the value of the param-eter α is to minimize the following cost function using thematrix Frobenius norm:

minα‖W − W(α)‖2 (9)

We can write the above difference as

J(α) = ‖W − W(α)‖2 = ‖(D−1K− I− α(K−D)‖2

This is a quadratic function in α. Upon differentiating andsetting to zero, we are led to the global minimum solution:

α =tr(KD−1K)− 2tr(K) + tr(D)

tr(K2)− 2tr(KD) + tr(D2)(10)

For sufficiently large n, the terms tr(D) and tr(D2) dom-inate the numerator and the denominator, respectively.Hence,

α ≈ tr(D)

tr(D2)=s1s2, (11)

where

s1 =

n∑i=1

di, and s2 =

n∑i=1

d2i (12)

This ratio is in fact bounded as 1n ≤

s1s2≤ 1

d, which for

large n justifies a further approximation:

α ≈ 1

d(13)

where d = mean(dj).

3. Properties of W3.1. Pixel- and Frequency-Domain Behavior

As a matter of practical importance, we look at how theapproximation changes the weights applied for computingany given pixel of the output image. To construct the pixelzi of the output from the input image, the exact weightsused (in the j-th row of W) are:

zi = wTi y =

1

di[ki1, · · · , kii, · · · , kin]y

In contrast to this, the approximate filter uses the weights

zi = wTi y = α

[ki1, · · · , α−1 + kii − di, · · · , kin

]y

Note that the center (self) weight corresponding to the po-sition of interest i has been changed most prominently, andthe other weights are pushed in the opposite direction as thechange in this weight in order to maintain the sum as 1. Thecenter weight in fact can become negative, while the otherweights must remain positive.

Another way to make the comparison is more illus-trative. Define the shifted Dirac delta vector δi =[0, 0, · · · , 0, 1, 0, 0, · · · , 0] where the subscript i indicatesthat the value 1 occurs in the i-th position. We have

wTi = δi + α ([ki1, · · · , kii, · · · , kin]− diδi) (14)

= δi + α di(wTi − δi

)(15)

Rewriting this last expression we have a rather simple re-lationship between the exact and approximated filter coeffi-cients:

wTi − δi = α di

(wTi − δi

)(16)

So if we subtract 1 from the self-weight, then the resultingtwo filters are simply scaled by the coefficient αdi, which

3

controls the difference between the exact and the approxi-mated filter. In particular, if at pixel location i the normal-ization factor di is close to the mean d, then at that pixel, theapproximation is nearly perfect. More generally, gatheringall terms like (16), the respective filter matrices are relatedas

(W − I) = R(W − I) (17)

where R = αD.Canonically, if we consider the two filters W and W

as edge-aware low-pass (or smoothing) filters, then theircounter-parts W − I and W − I are high pass filters.In fact, these filters are directly related to different (butwell-established) definitions of the graph Laplacian oper-ator emerging from the same affinity matrix K. Namely,L = I − D−1K = I −W is known as the random walkLaplacian [9], whereas L = D−K = (I−W)/α is knownas the un-normalized graph Laplacian [9].

So what does all this tell us about how approximatingthe filter distorts the output image? As (17) makes clear,the distortion is concentrated in the higher-frequency com-ponents of the output. Furthermore, the degree of distortiongiven pixel-wise by the ratio dj/d. The overall distortionis small when the coefficients dj are tightly concentratedaround the mean d.

4. Two Interpretations of Approximate Filter

The approximate filter lends itself to various interpreta-tions which can help us better understand its effect. Thederivations follow directly from the definition.

• First, consider

W = I+ α(K−D) = I+ αD(W − I) (18)

Applying this filter to the image y, we have

Wy = y + αD(Wy − y) (19)

Moving y to the left-hand side, we have

Wy − y = αD(Wy − y) (20)z− y = αD(z− y) (21)

This shows that the residuals of the two filtering pro-cesses (exact vs. approximate) are simply pixel-wisescaled versions of one another, by the diagonal matrixαD.

• A slightly different manipulation yields

Wy = (I− αD)y + αDWy (22)z = (I− αD)y + αD z (23)

This shows that the un-normalized filter is a pixel-wiseblending of the input image and the output of the nor-malized filter. The coefficients of this blend are 1−αdiand αdi, for each pixel. It is tempting to think of this asa convex combination. But while the numbers αdi areclustered around 1, they are not restricted to the range[0, 1].

5. Effect of Approximation on Local VarianceSo far, our concern has been with maintaining the mean

brightness value in the output pixels of the filter. But wealso expect that the approximate filter should affect the vari-ance of the output pixels. Here we characterize this effect.Recall the pixel-wise expressions for the exact and approx-imate filter, respectively:

zi =

n∑j=1

wij yj , zi =n∑j=1

wij yj

The variance in the output pixel in terms of the variancein the input pixel is given by the sum-squared of the filterweights. That is,

var(zi) =

n∑j=1

w2ij

var(yi) = νi var(yi) (24)

var(zi) =

n∑j=1

w2ij

var(yi) = νi var(yi) (25)

It is of interest to establish a relationship between the factorsνi and νi. We proceed as follows:

νi = wTi wi (26)

=(δi + α2d2i (wi − δi)

)T (δi + α2d2i (wi − δi)

)(27)

= α2d2i νi + (α2d2i − 2α(1 + α)di + 1 + 2α) (28)≈ (αdi)

2 νi + (αdi − 1)2 (29)

where the last approximation assumes α � 1. Denotingρi = αdi we recall that this number typically hovers around1. We note that 1

n ≤ ν2i ≤ 1, but ν2i can grow larger than1 if ρi > 1. To summarize, the two variance factors arelinearly related as follows:

νi ≈ ρ2i νi + (ρi − 1)2 (30)

The contour plot in Fig. 1, shows the values of νi as a func-tion of ρi and νi.

6. ExperimentsAs an illustration of the concepts, we refer the reader

to Fig.4. At left, an image of an old man is shown (size

4

0.1

0.2

0.2

0.3

0.3

0.3

0.4

0.4

0.4

0.5

0.50

.5

0.5

0.6

0.6

0.7

0.7

0.8

0.8

0.6

0.6

0.9

0.9

1

1

0.7

0.7

1.1

1.1

0.8

0.8

1.3

1.5

0.9

0.9

1.7

1.9

2.1

ρ

σ2

0 0.5 1 1.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 1. Values of ν2 as a function of ρ (horizontal axis) and ν2

(vertical axis)

505 × 578) along with two (bilateral-) filtered versions. Atcenter is the exact bilateral filtered image, whereas on theright is the approximated filter. Visually, we note the ap-proximated filter produces an image that is less aggressivelysmoothed. To better see this difference, the (luma channel)difference between the two output images is shown at leftin Fig 5. In the center and right panels of the same fig-ure, the residuals of each filtered image against the inputare shown for reference. For this example, the estimatedα = 0.0108, corresponding to d = 92.88. In Figure 2,we can observe the values of dj for this image, where thevast majority are in fact close to the mean value. The more”rare” pixels (i.e. those with small total affinities dj) appearas blue, and indeed correctly predict the locations where thelargest differences between the exact and approximated fil-ter occur (see left panel of Fig. 5). The weaker smoothingeffect of the approximate filter is not always desirable. Forinstance, consider ”cartoonizing” the image, where we ex-plicitly need to suppress nearly all small details. We carriedthis out for the old man image using the method in [12]. InFig. 6, the cartoonized image using the exact BL filter isshown on the left panel. The approximated filter with theoptimal α is shown in the middle panel, where too manydetails are left behind to produce a good result. In order toget a better result, we can boost the value of α as shown inthe right panel. This shows both a weakness of the approx-imation, but also its flexibility since the shortcoming wasrelatively easy to overcome by tuning a single parameter α.

A second example will illustrate application to a noisyimage. The noisy ”Laundromat” image (of size 400× 600)is shown in Fig. 7. In the top panel is the original imageand the left and right panels contain the exact filter and theapproximation, respectively. In this case we used a non-local means (NLM) filter. For this image α = 0.0132, cor-responding to d = 76.02. The presence of noise gives asmaller average affinity. Figure 3 contains the values of djfor this image. In this case we see that the affinity valuesare less well-concentrated about the mean. But the overall

picture still correctly predicts differences between the exactand approximated filter (see left panel of Fig. 8).

In terms of accuracy of the approximation, as shown inFigs. 5 and 8 (left panels), the residuals between the base fil-tered images and the approximate filtered images are small,and also highly localized. Taken on average, the RGB val-ues across these two images differ by 2/255 in the oldmanimage and 3.5/255 in the laundry image. These translateto PSNR values between the base-filtered and approximate-filtered images of 42.1dB and 37.2dB, respectively.

20 40 60 80 100 120

Figure 2. Values of dj for the old man photo. Large values shownin red indicate pixels that have many “nearest neighbors” in themetric implied by the bilateral kernel.

7. Conclusions and Future WorkWe presented a conceptually simple method for approxi-

mating a class of normalized linear or non-linear filters withones that avoid pixel-wise normalization. The approximatefilters are easy to construct, and surprisingly accurate. Westudied the behavior of the approximated filter and showedhow it can be controlled with a single parameter for bothquality and nearness to the base filter.

Here we focused narrowly on the conceptual side of de-veloping the new framework and its properties. However,there may also exist compuational savings to be realized byavoiding pixelwise normalization. But this will likely de-pend on the instruction set and the hardware platform used.Speculatively, on standard floating point arithmetic, the sav-ings will be modest. There is also the possibility of com-puting normalized filters on integer valued images without

5

20 40 60 80 100 120

Figure 3. Values of dj image for the Laundromat photo. Largevalues shown in red indicate pixels that have many “nearest neigh-bors” in the metric implied by the NLM kernel.

having to convert to floats or using integer division alto-gether. In any case, the concept does have the advantage ofbeing completely general – it can be applied to any base fil-ter where conventional normalization is employed. Whetherany computational advantages broadly exist is work thatstill needs to be carried out.

To quantify the behavior of the approximate filter, weillustrated that the majority of the distortion it can causeis concentrated at high spatial frequencies and around twotypes of pixels: First are very ”rare” pixels that do not enjoythe affinity of many similar pixels (i.e. have few near neigh-bors as measured by the affinity kernel). The second arevery common pixels that have far more similar pixels thanaverage. These typically will correspond to large constantor slowly changing areas of the image, and hence the dis-tortions caused will generally not be visible. In either case,the locations where these distortions occur can be conve-niently predicted by computing the sum of the (unnormal-ized) weights pointing to each pixel and comparing this tothe average across the image.

Admittedly, our story is not complete in these few pages– some interesting questions remain open:

• Are there significant computational saving to be hadby using the approximated filter?

• It would be interesting to characterize the set of imagesfor which these approximations are most (and least)well suited. One would suspect the presence or ab-sence of texture would play a role.

• The computation of the factors d can be carried our ef-ficiently by applying the filter, without normalization,to an image of all 1’s. Still, our proposal does not get

rid of the computation of d’s altogether. What it does isbind together the per-pixel normalization factors intoone number. Perhaps there is a simpler way to com-pute α that altogether circumvents the explicit compu-tations of the intermediate values di. We conjecturethat fixed values of α would be sufficiently good forlarge classes of appropriately similar images.

References[1] A. Adams, J. Baek, and M. A. Davis. Fast high-

dimensional filtering using the permutohedral lattice.Computer Graphics Forum, 29(2):753–762, 2010. 1

[2] A. Adams, N. Gelfand, J. Dolson, and M. Levoy.Gaussian kd-trees for fast high-dimensional filtering.ACM Trans. Graph., 28(3):21:1–21:12, July 2009. 1

[3] M. Aubry, S. Paris, S. W. Hasinoff, J. Kautz, andF. Durand. Fast local laplacian filters: Theory andapplications. ACM Transactions on Graphics (TOG),33(5):167, 2014. 1

[4] A. Buades, B. Coll, and J. M. Morel. A review ofimage denoising algorithms, with a new one. Mul-tiscale Modeling and Simulation (SIAM interdisci-plinary journal), 4(2):490–530, 2005. 2

[5] F. Durand and J. Dorsey. Fast bilateral filtering forthe display of high-dynamic-range images. In ACMTransactions on Graphics (TOG), volume 21, pages257–266. ACM, 2002. 1

[6] M. Elad. On the origin of the bilateral filter and waysto improve it. IEEE Transactions on Image Process-ing, 11(10):1141–1150, October 2002. 2

[7] W. Hardle. Applied Nonparametric Regression. Cam-bride University Press, Cambridge [England] ; NewYork, 1990. 2

[8] C. Kervrann and J. Boulanger. Optimal spatial adapta-tion for patch-based image denoising. Image Process-ing, IEEE Transactions on, 15(10):2866–2878, Oct.2006. 2

[9] P. Milanfar. A tour of modern image filtering. IEEESignal Processing Magazine, 30(1):106–128, 2013. 2,4

[10] C. Tomasi and R. Manduchi. Bilateral filtering forgray and color images. ICCV, Bombay, India, pages836–846, January 1998. 2

[11] M. P. Wand and M. C. Jones. Kernel Smoothing.Monographs on Statistics and Applied Probability.Chapman and Hall, London; New York, 1995. 2

[12] H. Winnemoller, S. C. Olsen, and B. Gooch. Real-timevideo abstraction. ACM Trans. Graph., 25(3):1221–1226, July 2006. 5, 7

6

Input Image Standard filter Approximate filter

Figure 4. (Left) Input y; (Center) exact BL filter z , and (right) approximate BL filter z

Figure 5. Luma channel residuals shown on a common grayscale range [−35, 35]: (Left) z− z , (Center) y − z , (Right) y − z

Cartoonized with exact filter Cartoonized with approximate filter α = 0.01 Cartoonized with approximate filter α = 0.03

Figure 6. Cartoonized using the method of [12] with the exact filter (left), the approximated filter α = 0.01 (center), and α = 0.03 (right)

7

Approximate filter

Input Image

Standard filter

Figure 7. (Top) Input y; (left) exact NLM filter z , and (right) approximate NLM filter z

Figure 8. Luma channel residuals shown on a common grayscale range [−35, 35]: (Top) z− z , (left) y − z , (right) y − z

8

filtering images without normalizationmilanfar/publications/...filtering images without...

Documents