asymptotically optimal blind estimation of multichannel images

16
992 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006 Asymptotically Optimal Blind Estimation of Multichannel Images Ian Atkinson, Student Member, IEEE, Farzad Kamalabadi, Satish Mohan, and Douglas L. Jones Abstract—Optimal estimation of a two-dimensional (2-D) mul- tichannel signal ideally decorrelates the data in both channel and space and weights the resulting coefficients according to their SNR. Many scenarios exist where the required second-order signal and noise statistics are not known in which the decorrelation is difficult or expensive to calculate. An asymptotically optimal estimation scheme proposed here uses a 2-D discrete wavelet transform to approximately decorrelate the signal in space and the discrete Fourier transform to decorrelate between channels. The coefficient weighting is replaced with a wavelet-domain thresh- olding operation to result in an efficient estimation scheme for both stationary and nonstationary signals. In contrast to optimal estimation, this new scheme does not require second-order signal statistics, making it well suited to many applications. In addition to providing vastly improved visual quality, the new estimator typically yields signal-to-noise ratio gains 12 dB or higher for hyperspectral imagery and functional magnetic resonance images. Index Terms—Estimation, image processing, multidimensional signal processing, wavelet transforms. I. INTRODUCTION M ULTICHANNEL signals exist in many types including color images, signals from spatial sensor arrays, and hyperspectral imagery. These signals are multichannel in the strictest sense, as they consist of multiple, correlated channels rather than a single channel. In addition, certain signals, which may not be multichannel by nature, can be cast into a multi- channel form. Examples of this type of multichannel signal include frames of a video sequence or images from a functional Magnetic Resonance Imaging scan. Applications involving multichannel signals span several areas, including commu- nications, remote sensing, astronomy, and medical imaging. These applications prompt the need for reliable estimation of a signal from its noisy multichannel observation. In this paper, we consider the blind estimation of a two-dimensional 1 (2-D) multichannel signal from a noisy observation of the true signal. Statistical estimation techniques that incorporate a priori in- formation or assume a statistical signal model have proven to provide attractive performance in terms of signal-to-noise ratio (SNR) gain. The minimum mean-square-error (MSE) estimate Manuscript received June 25, 2004; revised February 25, 2005. The associate editor coordinating the review of this manuscript and approving it for publica- tion was Dr. Giovanni Ramponi. The authors are with the Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University of Illinois at Urbana-Cham- paign, Urbana, IL 61801 USA (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TIP.2005.863024 1 Here, the dimensionality of the multichannel signal refers to its spatial di- mensions. of a signal from a noisy observation is achieved by the Wiener filter, which uses second-order statistics to optimally decorrelate the data by transforming it into the Karhunen–Loéve (KL) do- main. In the KL domain, signal and noise are easily discerned, allowing noise to be attenuated while simultaneously retaining most of the desired signal. Although the results provided by the Wiener filter are mathematically desirable, this optimal esti- mator suffers from several drawbacks that often make its use im- practical. Most notable is the fact that calculation of the Wiener filter for a length- signal requires inversion of an cor- relation matrix with elements depending on the second-order signal statistics. Multichannel signal estimation requires estimating each channel of the signal. One approach to this problem would be to process each channel separately, effectively treating each channel as though it were an independent, single-channel signal. A major shortcoming of this approach is that it fails to utilize any information that can be derived from the interchannel rela- tionships present in the signal. A more sophisticated technique would attempt to recover the original signal by accounting not only for the information contained in individual channels, but also for the correlation among them. Exploiting this in- terchannel correlation can greatly improve estimation quality compared to processing each channel individually. The more general problem of multichannel image restora- tion, which allows for both a linear degradation operator and corrupting noise, has been well researched over the past two decades [1]–[4]. In [1], Hunt and Kübler assume the intra- channel and interchannel signal correlations are separable. This allows the optimal decorrelating transform (i.e., the KL transform) used by the Wiener filter to be decomposed into two transforms, which decorrelate within- and between-channel, respectively. Although this greatly simplifies the problem from a computational aspect, second-order signal statistics must still be known or estimated in order to compute the two decorrelating transforms. Galatsanos and Chin have proposed using a 2-D discrete Fourier transform (DFT) to transform the block Toeplitz correlation matrix into a matrix that is asymp- totically block diagonal [3]. The inverse of this asymptotically block diagonal matrix, which can be computed iteratively, can be used to calculate the minimum MSE estimate of the multichannel signal. However, when the channel count and/or spatial dimensions of the signal are large this technique may be difficult to implement since the correlation matrix is a square matrix with dimension equal to the total number of samples. A least-squares solution to the image restoration problem has been given by Galatsanos et al. in [2]. This method permits inclusion of a priori information in the reconstruction process 1057-7149/$20.00 © 2006 IEEE

Upload: dl

Post on 27-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Asymptotically optimal blind estimation of multichannel images

992 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Asymptotically Optimal Blind Estimationof Multichannel Images

Ian Atkinson, Student Member, IEEE, Farzad Kamalabadi, Satish Mohan, and Douglas L. Jones

Abstract—Optimal estimation of a two-dimensional (2-D) mul-tichannel signal ideally decorrelates the data in both channeland space and weights the resulting coefficients according to theirSNR. Many scenarios exist where the required second-order signaland noise statistics are not known in which the decorrelation isdifficult or expensive to calculate. An asymptotically optimalestimation scheme proposed here uses a 2-D discrete wavelettransform to approximately decorrelate the signal in space and thediscrete Fourier transform to decorrelate between channels. Thecoefficient weighting is replaced with a wavelet-domain thresh-olding operation to result in an efficient estimation scheme forboth stationary and nonstationary signals. In contrast to optimalestimation, this new scheme does not require second-order signalstatistics, making it well suited to many applications. In additionto providing vastly improved visual quality, the new estimatortypically yields signal-to-noise ratio gains 12 dB or higher forhyperspectral imagery and functional magnetic resonance images.

Index Terms—Estimation, image processing, multidimensionalsignal processing, wavelet transforms.

I. INTRODUCTION

MULTICHANNEL signals exist in many types includingcolor images, signals from spatial sensor arrays, and

hyperspectral imagery. These signals are multichannel in thestrictest sense, as they consist of multiple, correlated channelsrather than a single channel. In addition, certain signals, whichmay not be multichannel by nature, can be cast into a multi-channel form. Examples of this type of multichannel signalinclude frames of a video sequence or images from a functionalMagnetic Resonance Imaging scan. Applications involvingmultichannel signals span several areas, including commu-nications, remote sensing, astronomy, and medical imaging.These applications prompt the need for reliable estimation ofa signal from its noisy multichannel observation. In this paper,we consider the blind estimation of a two-dimensional1 (2-D)multichannel signal from a noisy observation of the true signal.

Statistical estimation techniques that incorporate a priori in-formation or assume a statistical signal model have proven toprovide attractive performance in terms of signal-to-noise ratio(SNR) gain. The minimum mean-square-error (MSE) estimate

Manuscript received June 25, 2004; revised February 25, 2005. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Giovanni Ramponi.

The authors are with the Department of Electrical and Computer Engineeringand the Coordinated Science Laboratory, University of Illinois at Urbana-Cham-paign, Urbana, IL 61801 USA (e-mail: [email protected]; [email protected];[email protected]; [email protected]).

Digital Object Identifier 10.1109/TIP.2005.863024

1Here, the dimensionality of the multichannel signal refers to its spatial di-mensions.

of a signal from a noisy observation is achieved by the Wienerfilter, which uses second-order statistics to optimally decorrelatethe data by transforming it into the Karhunen–Loéve (KL) do-main. In the KL domain, signal and noise are easily discerned,allowing noise to be attenuated while simultaneously retainingmost of the desired signal. Although the results provided bythe Wiener filter are mathematically desirable, this optimal esti-mator suffers from several drawbacks that often make its use im-practical. Most notable is the fact that calculation of the Wienerfilter for a length- signal requires inversion of an cor-relation matrix with elements depending on the second-ordersignal statistics.

Multichannel signal estimation requires estimating eachchannel of the signal. One approach to this problem would beto process each channel separately, effectively treating eachchannel as though it were an independent, single-channel signal.A major shortcoming of this approach is that it fails to utilizeany information that can be derived from the interchannel rela-tionships present in the signal. A more sophisticated techniquewould attempt to recover the original signal by accountingnot only for the information contained in individual channels,but also for the correlation among them. Exploiting this in-terchannel correlation can greatly improve estimation qualitycompared to processing each channel individually.

The more general problem of multichannel image restora-tion, which allows for both a linear degradation operator andcorrupting noise, has been well researched over the past twodecades [1]–[4]. In [1], Hunt and Kübler assume the intra-channel and interchannel signal correlations are separable.This allows the optimal decorrelating transform (i.e., the KLtransform) used by the Wiener filter to be decomposed into twotransforms, which decorrelate within- and between-channel,respectively. Although this greatly simplifies the problemfrom a computational aspect, second-order signal statisticsmust still be known or estimated in order to compute the twodecorrelating transforms. Galatsanos and Chin have proposedusing a 2-D discrete Fourier transform (DFT) to transform theblock Toeplitz correlation matrix into a matrix that is asymp-totically block diagonal [3]. The inverse of this asymptoticallyblock diagonal matrix, which can be computed iteratively,can be used to calculate the minimum MSE estimate of themultichannel signal. However, when the channel count and/orspatial dimensions of the signal are large this technique may bedifficult to implement since the correlation matrix is a squarematrix with dimension equal to the total number of samples.A least-squares solution to the image restoration problem hasbeen given by Galatsanos et al. in [2]. This method permitsinclusion of a priori information in the reconstruction process

1057-7149/$20.00 © 2006 IEEE

Page 2: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 993

via a regularization operator and is proposed as an alternativeto Hunt and Kübler’s method when accurate multichannelstatistics are unavailable. Schultz and Stevenson have proposeda Gibbs prior for the multichannel image coefficients that isused to compute the maximum a posteriori estimate of theoriginal image [4]. This prior depends on three parameters,which must be assigned values based on a priori informationor calculated from the data.

It is important to note that the multichannel Wiener filteryields the lowest-MSE estimate among all linear estimators.A nonlinear estimator may therefore provide an improvementover the Wiener filter. This is particularly true when the cor-rupting noise is not additive Gaussian, in which case linear fil-tering often performs poorly [5]. Although a comprehensive dis-cussion of nonlinear filtering techniques is beyond the scope ofthis work, several adaptive nonlinear filters based on order sta-tistics have been applied successfully to multichannel data [5].Specifically, adaptive nonlinear multichannel filters have beenshown to perform better than their linear counterparts when ap-plied to restoration of a color image corrupted by Gaussian plusimpulsive noise. Many of these techniques, however, require anoise-free reference signal for steering purposes, which is gen-erally unavailable as its presence would remove the need toestimate the signal in the first place.

Rao and Jones have presented a DFT and wavelet-based tech-nique for estimating one-dimensional (1-D) multichannel sig-nals gathered from a linear array of evenly-spaced sensors [6],[7]. This type of multichannel signal has a spatial componentcorresponding to signal variation due to sensor separation anda temporal component corresponding to the underlying signalitself. Data gathered from a linear array of evenly spaced sen-sors often exhibits high spatial correlation, especially when thesensor spacing is small and/or propagation effects are minimal.This technique exploits both the intrachannel and interchannelsignal correlations using the DFT and discrete wavelet trans-form (DWT) to efficiently approximate the optimal data decor-relation achieved using second-order signal statistics.

In this paper, we focus on the creation of a near-optimal es-timation scheme for a 2-D multichannel signal gathered froman arbitrary multichannel source. As our primary interest in thispaper is with 2-D multichannel signals, we shall refer to themsimply as multichannel signals for brevity. The estimator appliesto a very general class of signals and is thus not restricted to, orbiased toward, a specific source signal type. An imposed con-straint for asymptotic optimality, however, is that the correlationbetween any pair of channels is constant over the entire spatialdimensions of the signal. This requirement will be discussed inmore detail later in the paper. This is equivalent to assumingthat the spectral-spatial correlation of the signal is separable asHunt and Kübler did in [1]. However, rather than estimate thesecond-order statistics from the observed data, we approximatethe optimal spatial decorrelating transform with a 2-D DWT.This allows our proposed method to draw from the large bodyof wavelet-based estimation techniques such as those discussedin [8]–[12].

This paper is organized as follows. A brief review of wavelet-based estimation of single-channel 2-D signals is provided inSection II. Benefits of using an overcomplete wavelet expan-

sion and adaptive wavelet-domain thresholding are discussed,as this will play an important role in the development of ourmultichannel estimator. Section III discusses multichannel sig-nals and develops the corresponding notation used throughoutthe paper. Section IV discusses the multichannel version of theWiener filter and its associated shortcomings before we developour near-optimal estimation scheme in Section V, which over-comes these limitations. Numerical examples are provided inSection VI to illustrate the performance of the proposed estima-tion framework.

II. WAVELET-BASED ESTIMATION OF 2-D SIGNALS

The use of wavelets as a signal processing tool has grown inrecent years, with wavelets being applied to a variety of appli-cations including compression, denoising, analysis, and estima-tion [8], [11]–[13]. Results from wavelet-based estimation oftenrival those from Wiener filtering in both MSE and, in the caseof images in particular, perceived signal quality. In this section,we review the basic process for wavelet-based estimation of 2-Dsingle-channel signals.

Let , for , ,denote the (with an integer power of 2) piece-wisesmooth signal2 that has been degraded by zero-meanGaussian noise with variance . Let , , and be thematrices containing the samples of the observation, signal, andnoise, respectively. Furthermore, let and denote the2-D orthonormal forward and inverse wavelet operators with

decomposition levels [13], [14]. At each scale , where, the wavelet coefficients are grouped into three detail

subbands, which we denote , , and . At the mostcoarse scale , there is the additional subband thatcontains the low resolution residual of the original signal.

In the wavelet domain, the signal, noise, and observation arerelated by

(1)

Since corrupting noise is zero-mean Gaussian with varianceand an orthonormal wavelet is being used, will likewise bezero-mean Gaussian with variance . Furthermore, we knowthat will be sparse since is piece-wise smooth [13], [15].The joint implication of these facts is that will be composedof a few large-magnitude coefficients and manysmall-magnitude coefficients (noise). Thus, we can equivalentlystate that the DWT approximates the KL transform for .

All estimation techniques have the common goal of sup-pressing the noise while minimally distorting the signal. Inmany cases, removal of noise in the spatial domain is difficult asevery coefficient contains both signal and noise. Transformationinto the wavelet domain compacts the signal into relatively fewcoefficients while leaving the noise statistically unchanged. Inwavelet-based estimation, the wavelet-domain coefficients thatrepresent mostly noise are set to zero while the coefficients thatrepresent are retained. Since signal coefficientsare assumed to be large compared to noise coefficients, thiscorresponds to thresholding the wavelet-domain coefficients.

2The assumption that the signal has equal integer power of two dimensions,though not required, makes the results of this section follow more easily.

Page 3: Asymptotically optimal blind estimation of multichannel images

994 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Therefore, we can estimate a 2-D signal using wavelet tech-niques via

(2)

where represents the thresholding operator, which is ap-plied to the detail subbands only. The soft threshold, which willbe one of the thresholds used in this paper, takes the followingform:

else(3)

where is the threshold level and is a wavelet-domain coef-ficient. Notice that should the wavelet coefficient be complex,the soft threshold shrinks the real and imaginary parts simulta-neously to preserve the phase [16].

Choice of the threshold level is obviously critical to estima-tion quality, as it determines which coefficients will be retainedand which will be discarded. Good results, in terms of per-ceived quality and SNR, are often achieved using a thresholdof . It can be shown [17] that this threshold will, withhigh probability, remove the largest noise coefficient. In prac-tical situations where the noise variance is unknown, it canbe estimated reliably from the finest-scale wavelet coefficientsusing the robust median estimator [8], [12].

A. Wavelet-Based Signal Estimation Using an OvercompleteWavelet Expansion

Despite its widespread use and celebrated results, thewavelet-based estimation scheme outlined is not without lim-itations. Most notable is the shift-varying nature of the 2-DDWT due to the downsampling operations inherent to the DWT.The Algorithme á Trous can be used to compute an overcom-plete, undecimated, shift-invariant DWT [13]. Estimation anddenoising using an overcomplete wavelet expansion have beenfound to provide slightly improved results3 that correspond toaveraging the estimates of all possible shifts of the signal usinga critically-sampled DWT [18].

In order to preserve the signal power when transforming thesignal into the wavelet-domain using an overcomplete waveletexpansion, the analysis and synthesis filters must each be scaledby a factor of . This filter scaling causes the magnitude ofthe coefficients at the decomposition level to be scaled by

, which means we must adjust the threshold accordingly aswell. The threshold applied to the coefficients of the decom-position level of an overcomplete wavelet expansion should be

(4)

Neglecting this threshold scaling will result in a larger numberof coefficients being zeroed by the thresholding operation thandesired. It is important to realize that although we are scalingthe threshold for each specific decomposition level, the effect isthat of a constant threshold being applied to all subbands.

Removing the downsampling operations from the DWTmeans that the subbands no longer decrease in size with eachdecomposition level. Therefore, if the original signal is ,

3Additional 1- to 2-dB SNR gain over wavelet-based estimation using a stan-dard critically-sampled DWT.

each subband of an overcomplete expansion will likewise be. When several levels of decomposition are desired for

even a moderately-sized dataset, an overcomplete expansionmay be impractical due to computational and/or storage issues.

B. BayesShrink Adaptive Wavelet Threshold

A drawback of wavelet-based estimation, and even its over-complete extension, is that the same threshold is applied to allsubbands. While this fixed threshold may be ideal when all sub-bands are simultaneously considered, it will likely be subop-timal for individual subbands. An obvious improvement to thissingle threshold approach is to adapt the threshold for a givensubband based on the data and/or some a priori information.Recently, a subband adaptive threshold termed BayesShrink wasproposed [10] for 2-D signals with detail subband coefficientshaving a generalized Gaussian distribution. While this methodonly slightly increases the computational complexity4 of the es-timation process, it has been found to provide significant im-provements over both standard and overcomplete wavelet-basedestimation.

It has been experimentally shown and used in several applica-tions [10], [14], [19] that for a large class of images, the waveletcoefficients of the detail subbands (HH, HL, LH) obey a gener-alized Gaussian distribution for all decomposition levels. Eachsubband can be thought of as a set of independent identicallydistributed generalized Gaussian random variables. The proba-bility density function of a generalized Gaussian random vari-able is defined as

(5)

where , is the gamma function.The two parameters of the generalized Gaussian density, and

, control the overall form of the distribution and may vary fromsubband to subband.

It has been shown [10] that for a range of values between0.5 and 4, the optimal soft threshold for a given detail sub-band is well approximated by a soft threshold proportional to thestandard deviation of the wavelet-domain coefficients for thatsubband of the signal. Specifically, the BayesShrink threshold

(6)

is within 5% of where , and is the standarddeviation of the wavelet coefficients in the subband of interestfor the signal. Since each detail subband can have a differentgeneralized Gaussian distribution and therefore a different vari-ance, a separate threshold is calculated for each detail subbandusing only the data from that subband and the noise variance,which is constant across all subbands. This leads to a data-driventhreshold that is adaptive on a per-subband level.

III. MULTICHANNEL SIGNALS AND THEIR NOTATION

A multichannel signal can be thought of as a generalizationof a 2-D signal to multiple correlated signals simultaneously

4The additional computations grow in a log-like manner with the number ofdecomposition levels to a maximum value of one per input sample.

Page 4: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 995

considered. It is therefore reasonable to think of a multichannelsignal as a three-dimensional (3-D) dataset. Although visuallyand intuitively pleasing, this does not lend itself well to the mul-tichannel Wiener filter or to the development of our proposed es-timation scheme. A vector form of the signal, on the other hand,provides a convenient signal representation.

A single observed channel of an multichannel signalwith channels is composed of the combination of the desiredsignal channel and an undesired, corrupting noise component.For simplicity, we assume that all channels have been spatiallyco-registered such that position corresponds to the sameelement in all channels of the signal. In addition, we assumethat is an integer power of two and that the noise may be ade-quately modeled as additive zero-mean Gaussian with variance

and is uncorrelated in both channel and space. Therefore,if we let , , and be the matrices containing theobserved, desired, and noise samples for the channel, respec-tively, then observed channel expression becomes

(7)

Following [1]–[4], we order the data lexicographically, whichallows , , and to be represented as the length- vec-tors , and , respectively. To complete the vector rep-resentation, the channel vectors are stacked to create thelength- vector that contains all the data for the multi-channel signal

(8)

It should be noted that ordering schemes other than lexico-graphic have been introduced and utilized for multichanneldata [5], [20], [21].

This multichannel signal representation results in the data foreach channel being in a vector form as opposed to a matrix.Therefore, it will prove useful to have a vector equivalent formof the wavelet-based estimation outlined in Section II. To ac-complish this, we note that a possible, although nonstandard,method of computing a 2-D DWT is to first compute a complete

-level 1-D DWT for each row of the data and next compute acomplete -level 1-D DWT of each resulting column. Ifdenotes the 1-D DWT matrix with decompositionlevels, then we will temporarily define the 2-D DWT operatorto be

(9)

to allow for a more straightforward derivation of our estimator.Note that, here, we have assumed a wavelet with purely real filtercoefficients. Although it will not be a factor in the derivation, itshould be noted that using this nonstandard row-column form ofthe 2-D DWT in (9) produces a different subband tiling than thestandard 2-D DWT. Later, we will show that the standard 2-DDWT can be used without loss of generality.

Using the row-column version of the 2-D DWT, and the samedefinitions for signal matrices , , , and as in Section II,we can determine a vector equivalent form of (2). If denotesthe length- vector that contains the lexicographically ordered

elements of , then using basic Kronecker product properties[22], the vector form of the row-column 2-D DWT (9) is

(10)

where denotes a Kronecker product, and is the length-vector with elements that are the lexicographically ordered ele-ments of . By similar reasoning, the vector form of (2) is

(11)

where is the vector containing the lexicographically orderedelements of the estimated signal and is the earlier men-tioned thresholding operator.

IV. TWO-DIMENSIONAL MULTICHANNEL WIENER FILTER

Optimal estimation of 2-D multichannel signals is achievedusing a multichannel Wiener filter, which yields the minimumMSE estimate of a signal given its noisy multichannel obser-vation [1]. In this section, we review the multichannel Wienerfilter to gain an understanding of the components that contributeto its optimal performance. A KL-domain examination will re-veal that the two key attributes to its performance are the abilityto optimally decorrelate the data in both channel and space andto ideally weight the resulting coefficients.

Consider an zero-mean multichannel signal withspatially co-registered channels and its noisy observation. Let

and be the vector-form representations of the desired andobserved signals as defined in Section III, respectively. By def-inition [23], the signal correlation matrix is thematrix

.... . .

(12)where

(13)

is an cross-correlation matrix.Recalling that the noise is assumed to be uncorrelated in both

channel and space, it is straightforward to see that the noisecorrelation matrix is

(14)

where is an identity matrix. Thus, multichannelWiener filter has the familiar form of

(15)

The eigenexpansion of can be easily obtained as

(16)

where , and are the rank, eigenvectors, and eigen-values of , respectively, and

(17)

Page 5: Asymptotically optimal blind estimation of multichannel images

996 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Note that can be eigenexpanded as , which im-plies that will optimally decorrelate the data vector . Since

contains the entire multichannel signal in vector form, thisdecorrelation corresponds to decorrelating the signal in bothchannel and space. Hence, is termed the channel-spatial KLtransform. It can be seen from (16) that the Wiener filter createsan estimate by decorrelating the signal, weighting the decorre-lated coefficients, and finally recorrelating the signal. The decor-relating, weighting, recorrelating are statistically optimal, butare so at the cost of requiring the second-order signal and noisestatistics.

V. NEAR-OPTIMAL WAVELET-BASED MULTICHANNEL

SIGNAL ESTIMATION

Despite its mathematically optimal results, the Wiener filtersuffers from several shortcomings that often make it imprac-tical or undesirable for use. First, requiring the second-orderstatistical description of the signal and corrupting noise makesthe Wiener filter inherently signal-dependent. Second, for largedatasets, calculation of the Wiener filter requires the inversionof a large matrix, which is generally a nontrivial task. Finally,provided we can calculate the Wiener filter, its structure for non-wide-sense-stationary signals will not lead to an efficient imple-mentation. These points motivate our search for an alternativeto the Wiener filter for estimating a multichannel signal fromits noisy observation. In this section, we create a wavelet-basedtechnique for estimating multichannel signals that aims to ap-proximate the Wiener filter in performance while overcomingits drawbacks.

In the previous section, we investigated the Wiener filter inthe eigen-domain and observed that the power of the Wienerfilter originates from its ability to optimally decorrelate thedata by transforming it into the KL-domain, and to ideallyweight the KL-domain coefficients. Next, we show how underminimal assumptions, the channel-spatial KL transform can bedecomposed into two KL transforms, one that decorrelates thedata in channel and a second that decorrelates the data in space.In [1], Hunt and Kübler propose this separation of the optimaldecorrelating transform to simply the multichannel Wienerfilter; Wernick, et al. employ this separability on regularizationoperators to efficiently compute regularized positron emissiontomography (PET) image sequences [24]. This separabilitywill be vital to our proposed estimation framework, as it allowsapproximations to be made independently to the spatial andchannel decorrelating transforms.

A. Decomposing the Channel-Spatial KL Transform

Calculation of the channel-spatial KL transform requiressecond-order statistics for both the multichannel signal andcorrupting noise or, alternatively, second-order statisticaldescriptions of the signal, noise, and channels. The channelcorrelation, or equivalently coherence, is influenced by anumber of factors that depend on both the signal and theimaging modality. For example, channels corresponding toclosely positioned sensors in a sensor array may exhibit highcoherence due to the relatively small distance separating thesensors. Coherence of distant sensors, however, will often be

significantly lower due to the large sensor separation and theconsequent increase in propagation effects. Channel coherencefor multispectral and hyperspectral signals is more complexand depends heavily on the nature of the signal itself, or specif-ically, the emission or reflectance characteristics of the signalat various wavelengths.

In general, the correlation between any two samples of a mul-tichannel signal depends on both the spatial positions of thesamples and the channels involved. That is

(18)

where denotes the sample of the channel andis a function dependent on the true multichannel

signal vector .Accurate modeling of this four-parameter function is diffi-

cult. Therefore, we will ignore the interchannel correlation vari-ations and assume that the correlation between the andchannels can be adequately represented by a single scalar coher-ence coefficient, . With this approximation made,the correlation between any two channel vectors can now be ex-pressed as

(19)

where is the underlying signal correlation matrix of size. Writing (12) using (19) gives

.... . .

(20)where is the channel coherence matrix with its ( ,

) element being . Thus, we can separate the desired signalcorrelation matrix into the two matrices and , whichcontain the channel coherence and spatial correlation, respec-tively. The validity of this separability assumption is discussedby Hunt and Kübler in [1].

The eigenexpansions of and are given by

(21)

(22)

From these expansions, we can see that and diago-nalize and , and will, therefore, optimally decorrelate thedata in channel and space, respectively. For this reason, they aretermed the channel KL transform and the spatial KL transform.Applying a basic property of Kronecker products [22], it can beshown that

(23)

The significance of (23) is that the channel-spatial KL trans-form, which optimally decorrelates the signal in both channeland space, can be decomposed into a channel KL transform,which decorrelates the data in channel, and a spatial KL trans-form, which decorrelates the data in space.

B. Wavelet-Based Multichannel Signal Estimator

Decomposition of the channel-spatial KL transform into achannel KL transform and spatial KL transform allows for the

Page 6: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 997

channel coherence and spatial correlation of the signal to behandled independently. In many cases, the signal, or more im-portantly its correlation structure, is unknown and may be diffi-cult to estimate. Furthermore, both the spatial KL transform andoptimal weighting values are undesirably dependent on signalstatistics. To overcome these limitations, we approximate thespatial KL transform and optimal weighting values with a trans-form and weighting values that preserve performance withoutrequiring second-order signal statistics.

The role of the spatial KL transform is to spatially decorre-late the signal. The approximation to the spatial KL transformshould therefore provide near-optimal spatial decorrelation ofthe signal without requiring signal knowledge, statistical orotherwise. Section II discussed how wavelets may be utilizedin single-channel 2-D signal estimation. An important pointbrought forth was that for a wide class of signals a waveletbasis forms an approximate KL basis, which implies the DWTdecorrelates signals effectively. Since the DWT approximatesa KL transform for many signals and is a predefined transformwell suited to efficient implementation, we will approximatethe spatial KL transform with a 2-D DWT.

The second approximation we must make in our estimationframework is for the optimal weighting values. Approximatingthe spatial KL transform with a 2-D DWT means the signal willonly be approximately decorrelated in space. However, optimalchannel decorrelation is still achieved by the channel KL trans-form . Therefore, after decorrelating the signal in channel,no residual channel correlation will remain and the resultingchannels5 can be processed independently without degrading theestimation results. These resulting channels can be estimatedusing the wavelet-based techniques described in Section II byapproximating the ideal weighting values with a wavelet-do-main threshold.

Straightforward implementation of an estimator using theseapproximations would prove challenging. The Wiener filter firstdecorrelates the data by transforming it into the KL domain.Our near-optimal scheme approximates this signal-dependenttransform with a 2-D DWT to perform the spatial decorrela-tion. Thus, the data vector is decorrelated by the approximatechannel-spatial KL transform , which isan matrix. Here, we have used the row-columnversion of the 2-D DWT defined in (9). Implementation of thisdecorrelation for moderate to large and will be difficult atbest.

To solve this problem, we recall thatand use basic Kronecker product properties [22] to deduce that

is equivalent to the lexicographically ordered elements of

where . Using the same reasoning

5The resulting channels refers to the channels after the optimal channel decor-relation but before the approximate spatial decorrelation.

is equivalent to the lexicographically ordered elements of

This last equivalence allows us to again use the standard 2-Dorthonormal wavelet operator as opposed to the row-columnversion that was used in the development. This equivalent repre-sentation allows the proposed framework to be efficiently imple-mented with far reduced computational complexity comparedto a straightforward implementation. The complete estimationscheme is outlined in Algorithm 1.

Algorithm 1 Wavelet-Based MultichannelSignal Estimation

1) Decorrelate the data across channelsusing the channel KL transform .

2) Decorrelate each resulting channelspatially using the 2-D DWT .

3) Threshold each decorrelated channelusing the threshold operator .

4) Recorrelate each thresholded channelspatially using the inverse 2-D DWT

.5) Recorrelate the data across channels

using the inverse channel KL trans-form .

Using the approximations to the spatial KL transform andideal weighting values, our estimator closely approximates themultichannel Wiener filter in performance while not sufferingfrom the realization difficulties. Specifically, it

1) is signal independent and relies only on a model for, ora priori information of, channel coherence;

2) can easily be calculated as it requires no difficult inver-sion;

3) does not require signal or noise statistics other than asimple estimate of as described in Section II.

Furthermore, this scheme is well suited to a structured im-plementation as there exist fast algorithms for calculating theforward and inverse 2-D DWT. A possible concern is that thechannel coherence matrix may not be Toeplitz, making efficientcomputation of the channel KL transform impossible. Whilethis concern is valid, the number of channels will oftenbe small enough compared to the size of the signal within eachchannel that the effect is minor.

Although we have assumed a zero-mean multichannel signal,it can be shown [17] that in contrast to the multichannel Wienerfilter, the proposed estimation framework is equally well suitedto nonzero-mean signals with no modification. This hinges onthe fact that the wavelet-domain threshold is not applied to the

subband.Often, the exact value for will be unavailable due to the

channel coherence matrix being unknown. When it is computa-tionally feasible, one can estimate from the noisy data (e.g.,see [1]) and use this estimated in place of the true value. How-ever, this estimate, like the multichannel Wiener filter, will di-rectly depend on the data and, in general, will not be well suitedto efficient implementation. In the next section, we discuss an

Page 7: Asymptotically optimal blind estimation of multichannel images

998 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Fig. 1. Comparison of SNR values for noisy and estimated versions of the hyperspectral dataset LunarLake with � = 1500 . In each of the 224 channels, theestimated version of the channel has substantially higher SNR than the noisy version.

approximation to the channel KL transform that can be madefor specific classes of signals which is both data independentand allows for an efficient implementation.

C. DFT Approximation to the Channel KL Transform

In several scenarios, the channel coherence matrix, , is un-known and difficult to estimate or model. This is particularlytrue in problems that have many channels, such as a large sensorarray. The difficulty arising from an unknown matrix is thatthe transform to decorrelate the channels cannot be computed.However, if channel coherence depends (or approximately de-pends) on only on channel separation, that is, for some

, (18) can be written as

(24)

then is a Toeplitz symmetric matrix, which is asymptoticallydiagonalized by the DFT. With this observation, when thechannel correlation matrix is (approximately) Toeplitz sym-metric we may approximate the channel KL transform thatoptimally decorrelates the data in channel with thenormalized DFT matrix that decorrelates the datain an asymptotically optimal fashion in channel.6 In additionto asymptotically identical performance, this estimator has theadded attribute of allowing efficient implementation via theFFT and fast 2-D DWT, and requires no channel knowledge.

One can argue that in the absence of any channel coherenceinformation, assuming that (24) holds is reasonable in manyapplications, especially when the (physical, spectral, etc.) dis-tance between adjacent channels is small. Experimental resultssupporting this claim have been obtained by assuming (24)holds when channel coherence information is not available (seeSection IV).

6The normalization bypM is done to make F =

pM a unitary transform.

TABLE IAVERAGE AND SELECTED CHANNEL SNR VALUES IN DECIBELS FOR

OBSERVED AND ESTIMATED VERSIONS OF THE HYPERSPECTRAL

DATASET LUNARLAKE WITH � = 1500

D. Multichannel Signal Eestimation Using an OvercompleteWavelet Expansion

As discussed in Section II-A, wavelet-based image estimationusing an overcomplete wavelet expansion can improve esti-mation quality over using an orthonormal wavelet expansion.Since the overcomplete wavelet expansion is a straightforwardextension of the critically-sampled wavelet expansion, we canreplace the 2-D DWT in Algorithm 1 with an overcompletewavelet expansion to gain similar benefits in multichannelsignal estimation.

An overcomplete 2-D wavelet expansion can be calculatedusing a 2-D version of the Algorithme á Trous, an efficientimplementation of which requires operations per inputsample [13].7 Here, is the length of the filters used to computethe 2-D DWT; the length of all filters will be the same since weonly consider orthogonal wavelets. In contrast to the over-

7An efficient implementation of the Algorithme á Trous accounts for the zerosin the upsampled filters to reduce the total required computations.

Page 8: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 999

Fig. 2. Comparison of original, noisy, and estimated versions of channel 50 of the hyperspectral dataset LunarLake with � = 1500 . Several major andminor features are recovered from the noisy signal and are readily identifiable in the Soft and BayesShrink estimates that are not visible in observed signal orwiener2 estimate. SNR values are available in Table I. The Watson metric values for the images are Noisy = 3:338, wiener2 = 1:539, Soft = 0:424, andBayesShrink = 0:388, respectively.

complete 2-D DWT, the standard 2-D DWT requires onlyoperations per input sample [13].

Furthermore, thresholding one channel of an overcompleteexpansion requires thresholding coefficients, asopposed to only coefficients for a criticallysampled 2-D DWT.8 Therefore, neglecting the channel decorre-lation/recorrelation, the total operational cost of wavelet-basedmultichannel signal estimation using an overcomplete waveletexpansion is

compared to only

for a critically sampled wavelet expansion. This drastic increasein computations may limit the practicality of using an overcom-plete wavelet expansion for all but small multichannel datasets.

E. Multichannel Signal Estimation Using an Adaptive WaveletThreshold

In this section, we consider use of the BayesShrink adaptivethresholding scheme introduced in Section II-B in multichannelsignal estimation. The BayesShrink method of threshold selec-tion assumes that the coefficients of each detail subband obeya generalized Gaussian distribution. We must, therefore, verifythat the subbands of the decorrelated channels of a multichannelsignal will indeed obey a generalized Gaussian distribution.

The channel and spatial decorrelation are both linear op-erations and are applied to different dimensions of the signaland therefore commute. That is, decorrelating in channel and

8Here, we assume that the threshold is not applied to the coarse approximationsubband LL .

then in space is equivalent to decorrelating in space and then inchannel. This means that the subbands of the fully decorre-lated channel are simply a linear combination of the subbandsof the original channels. And, since a linear combination ofgeneralized Gaussian random variables is also a generalizedGaussian random variable, we can conclude that if all thedetail subbands of all the original channels obey a generalizedGaussian distribution, then so will the detail subbands of allof the fully decorrelated channels, allowing the BayesShrinkadaptive threshold be used as the thresholding operation inAlgorithm 1.9

It can be shown that the added computational cost per sampleof using the BayesShrink adaptive threshold rather than a stan-dard soft threshold is

where again is the number of decomposition levels used [17].The additional computational cost of using the BayesShrinkadaptive threshold is minimal, having as a worst-case only oneadditional operation per input sample.

The use of the BayesShrink adaptive threshold is merely to il-lustrate that adaptive thresholding can be applied with success tomultichannel signal estimation and is in no way meant to implythat the BayesShrink threshold is optimal for this application.

VI. NUMERICAL EXAMPLES

To illustrate the performance of our proposed multichannelsignal estimation framework, we present simulation results fromhyperspectral and multispectral imaging, as well as a medicalimaging application. Datasets were processed in the followinggeneral manner. First, the test signal was degraded by zero-

9Here, we have assumed that the � 2 (0; 4) for the decorrelated channels asthis is the valid range for BayesShrink [10].

Page 9: Asymptotically optimal blind estimation of multichannel images

1000 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Fig. 3. Performance comparison, in average SNR gain per channel of estimation schemes on a 256� 256� 224 subset of the hyperspectral dataset Lunar Lakefor various input SNR levels. Graph is of average SNR gain for all channels in the dataset.

mean additive Gaussian noise. Estimation was performedusing the algorithm outlined in Algorithm 1 with a Daubechieslength-4 wavelet and a fixed threshold of or an adaptivethreshold. Unless otherwise noted, noise variance was esti-mated from the noise-degraded signal using the robust medianestimator and the DFT was used to approximate the channel KLtransform. Although suboptimal (when the channel coherencematrix is not circulant), use of the DFT to decorrelate in channelrather than estimating the channel KL transform allows for the(channel and spatial) decorrelation to be performed efficientlyand independent of the observed data or a priori information.The performance implications of this choice are explored inSection VI-A3.

For simplicity, we will denote the estimators by the thresh-olding scheme and wavelet expansion employed. Therefore,the Soft Estimator, which yields the Soft Estimate, processesthe data according to Algorithm 1 using a soft threshold. Like-wise, there is also the BayesShrink Estimator and BayesShrinkEstimate. The Soft-Overcomplete Estimator is identical to theSoft Estimator with the exception that it uses an overcompletewavelet expansion rather than a critically sampled waveletexpansion. For a base performance comparison, the inter-channel signal correlation of the dataset was ignored and eachchannel was independentally estimated with the MATLABroutine wiener2,10 yielding the wiener2 estimate, or thewavelet-based estimation scheme outlined in Section II, whichwe denote the Independent Wavelet estimate.

Estimator performance is measured in terms of SNR in dB,which is calculated as

and visual quality. Visual quality is measured both subjectivelyand by the Watson metric [26].

10The MATLAB routine wiener2 implements the adaptive Wiener filter al-gorithm described by Lim in [25].

A. LunarLake Simulation Results

LunarLake11 is a 12-bit hyperspectral dataset acquired fromAVIRIS12 having 224 frequency bands (channels), each with aspatial resolution of 1432 614, between wavelengths of 400and 2500 nm. As is often the case with multichannel datasets,LunarLake is large, making estimation by any means a substan-tial task in terms of computations. To address this, we work withthis dataset in two forms, the complete datasetand a sub-dataset created from the completedataset that contains only samples from and

in all channels.The complete dataset was degraded by noise having variance

resulting in an initial average channel SNR of ap-proximately 5 dB. Estimates were generated from this obser-vation using the Soft, BayesShrink and wiener2 estimators.Fig. 1 shows the SNR for each of the 224 channels of the noisyand estimated signals. Specific SNR values for select channelsand the entire dataset are given by Table I. Original, noisy, andestimated versions of channel 50 are shown in Fig. 2. As can beseen, the estimated versions of the signal contain much of thedetail from the original noise-free signal that is not detectablein the noisy version and the estimates from our proposed frame-work have noticeably less residual noise than the wiener2 es-timate. The perceptual error, as measured by the Watson metric,is lowest for the BayesShrink estimate (0.338), followed by theSoft estimate (0.424), and finally by the wiener2 estimate(1.539). This indicates that each of proposed estimators yieldsubstantially better visual quality than the wiener2 estimator,a fact that is readily apparent in Fig. 2. On average, SNR gainswere largest, over 15 dB, using the BayesShrink estimator; by

11Provided through the courtesy of Jet Propulsion Laboratory, California In-stitute of Technology, Pasadena, CA.

12Airborne Visible/Infrared Imaging Spectrometer operated by NASA. Seehttp://aviris.jpl.nasa.gov for additional information.

Page 10: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 1001

TABLE IIAVERAGE CHANNEL SNR VALUES IN DECIBELS FOR OBSERVED AND ESTIMATED VERSIONS OF THE SUBSET OF THE HYPERSPECTRAL DATASET LUNARLAKE

AT VARIOUS NOISE VARIANCES. THE BEST-PERFORMING ESTIMATOR IS BAYESSHRINK FOR LOW TO MODERATE NOISE LEVELS (INPUT SNR > 8 dB)AND SOFT-OVERCOMPLETE FOR HIGH NOISE LEVELS (INPUT SNR < 8 dB). WHILE THE SOFT ESTIMATOR YIELDS HIGHER SNR THAN wiener2 AND

INDEPENDENT WAVELET, IT CANNOT MATCH THE PERFORMANCE OF THE MORE SOPHISTICATED SOFT-OVERCOMPLETE AND BAYESSHRINK ESTIMATORS

exploiting the interchannel correlation, our framework is able toyield over twice the SNR gain (in dB) of wiener2.

1) Effect of Noise Variance on Estimator Performance: Toquantify the performance of the proposed framework as a func-tion of input SNR, observations were generated from the sub-dataset for various noise levels ranging from low-noise

dB to high-noisedB . Each resulting observation was estimated

using the Soft, Soft-Overcomplete, BayesShrink, Wavelet, andwiener2 estimators. The average SNR gain for each estima-tion method at the various noise levels is shown in Fig. 3. Es-timators from the proposed framework yield notably higher av-erage channel SNR gains than either of the schemes that ignoreinterchannel correlation.

For low to moderate noise levels, BayesShrink provides thehighest SNR gain of all estimators tested. We can see fromFig. 3 that the additional gain it yields falls off with noiselevel. The reason for this is that the threshold levels usedby the BayesShrink and Soft Estimators are both functionsof the noise standard deviation that therefore increase withnoise level. Eventually, both thresholds will be larger than allwavelet-domain coefficients, and both estimates will becomeidentical. At low noise levels, the additional gain from using theBayesShrink estimator is particularly impressive. Atthe SNR improves from 17.70 dB by almost 11 to 28.85 dBusing the BayesShrink Estimator instead of the Soft Estimator,which has an SNR gain of 8.85 dB.

In contrast to those of the BayesShrink estimator, the addi-tional gains provided by the Soft-Overcomplete estimator aremore constant across the noise variances tested. We can see fromFig. 3 and Table II that using an overcomplete wavelet expan-sion results in an additional 1 to 1.5 dB of SNR compared tousing a critically sampled 2-D DWT. These additional gains, al-though desirable, require the use of a computationally expen-sive overcomplete wavelet expansion, which may be imprac-tical for many applications due to the generally large size ofmultichannel datasets. Furthermore, the Soft-Overcomplete es-timator only outperforms the BayesShrink estimator for highnoise levels, at which the estimate will generally be of poorquality regardless of the estimation method due to the satura-tion of the desired data with noise.

TABLE IIISNR AND WATSON METRIC VALUES FOR CHANNEL 50 OF

THE OBSERVED AND ESTIMATED VERSIONS OF THE SUBSET

HYPERSPECTRAL DATASET LUNARLAKE WITH � = 800

Fig. 4 shows channel 50 of the original, noisy, and estimatedversions of the sub-dataset for . It is obvious that theBayesShrink estimate for this channel is far superior to otherestimates and is a very good match to the original channel. Thisis further illustrated by the values in Table III, which show thatthe BayesShrink estimator provides the highest SNR and lowestWatson metric by fairly substantial margins.

2) Effect of Channel Count on Estimator Performance: Fora finite but large number of channels, the DFT provides onlyapproximate channel decorrelation. Therefore, it is of interestto investigate the effect of the number of signal channels on theperformance of the proposed framework. We accomplish this byconsidering possible channel counts ata fixed noise variance of for the sub-dataset. Fora specific channel count , the -channel desired signal wascreated from the original 224-channel dataset by selecting the

most separated channels. Thus, if , channels 1, 113,and 224 would constitute the desired signal. Observed signalswere processed with the Soft, Soft-Overcomplete, BayesShrink,wiener2, and Independent Wavelet estimators.

At very low channel counts several of the methodsappear to improve with fewer channels. This is likely due to theconstruction of the -channel dataset by selecting the most sep-arated channels and that some channels (i.e., 160 and 110) havevery low average intensity compared to the rest of the dataset.Thus, when these channels are included and there are few totalchannels the impact is significant. One would expect that if allchannels had similar intensity, the performance trends wouldcontinue at low channel counts as well.

Page 11: Asymptotically optimal blind estimation of multichannel images

1002 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Fig. 4. Channel 50 of the original, observed, and estimated LunarLake sub-dataset with � = 800 . As can be seen from the SNR and Watson metric valueslisted in Table III, the BayesShrink estimator provides the best estimate of the of the original channel. Although the Independent Wavelet SNR indicates that it toois an improvement over the wiener2 and the noisy observation the Watson metric suggests that it is of similar quality to the wiener2 estimate. A visual inspectionreveals that the Independent Wavelet estimate is overly smoothed and a poor estimate of the original. The Soft-Overcomplete estimate (not shown) is approximatelyvisually equal to the Soft estimate.

3) Using the DFT to Approximate : A natural questionthat might arise is how well the DFT approximates the channelKL transform. To address this, we again consider estimating theobservation of the sub-dataset at various noise levels and com-pare using an estimate of the channel KL transform and the DFTto decorrelate between channels. For a given noise level, thechannel KL transform is estimated from the noisy observationas described in [1]. Estimators using this estimated channel KLtransform rather than the DFT for channel decorrelation are de-noted by the prefix (e.g., -Soft). In addition, we may de-termine the maximum performance that can be achieved usingan estimate of channel KL transform by estimating the trans-form from the original noise-free data. Estimators using thisoptimal channel KL transform are labeled -Soft (optimal)and -BayesShrink (optimal), respectively. Note that sincethe dataset being used in this example has samples perchannel, the estimated channel correlation matrix (and thus theresulting channel KL transform) should be good estimate of thetrue value even when calculated using the observed data.

The results are summarized in Fig. 5, which plots the av-erage SNR gain as a function of channel count for the variousestimates. Select SNR values are listed in Table IV for each ofthe signals. Although the overall SNR increases as channelsare added, the additional gain provided from an overcomplete

wavelet expansion or the BayesShrink adaptive threshold re-mains approximately constant. This is due to the fact that afterdecorrelating the channels via the DFT, we assume there is noresidual channel correlation and process the resulting channelsindependently. Any additional SNR provided by BayesShrinkor Soft-Overcomplete is therefore due to the additional SNRthat the respective estimator yields on each decorrelatedchannel and not the number of channels. Fig. 6 compares theperformance of the Soft and BayesShrink estimators, which usethe DFT to decorrelate between channels, with the -Soft,

-Soft (optimal), -BayesShrink, and -BayesShrink(optimal) estimators, which use the estimated channel KLtransform to decorrelate between channels. The performancedifference between using the DFT and an estimated channelKL transform is most significant for low noise levels andslightly more pronounced when the BayesShrink threshold isused. However, as the noise level increases the performancedifference shrinks to where is it substantially smaller than theSNR achieved by the DFT method. From Fig. 6, we see that asthe noise level increases the performance achieved using theDFT to decorrelate in channel approaches that of the optimallyestimated channel KL transform. Thus, for low input SNR’s,the DFT-based approach achieves near-optimal performance.In addition, as can be seen in Fig. 7, the estimates are visually

Page 12: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 1003

Fig. 5. Performance comparison, in average SNR gain per channel of estimation schemes on a 256� 256� 224 subset of the hyperspectral dataset Lunar Lakefor various numbers of signal channels. Graph is of average SNR gain for all channels in the dataset.

TABLE IVCOMPARISON OF SOFT, SOFT-OVERCOMPLETE, BAYESSHRINK, wiener2 AND INDEPENDENT WAVELET ESTIMATES OF THE SUBSET OF THE HYPERSPECTRAL

DATASET LUNARLAKE USING VARIOUS NUMBERS OF CHANNELS AT A FIXED NOISE VARIANCE OF � = 200 . TABLE IS OF AVERAGE SNRGAIN IN DECIBELS FOR ALL TESTED CHANNELS IN THE DATASET. THE BAYESSHRINK ESTIMATOR PERFORMS BEST OF ALL

METHODS FOLLOWED BY THE SOFT-OVERCOMPLETE ESTIMATOR AND FINALLY THE SOFT ESTIMATOR

quiet similar regardless of which channel KL transform is used.This is also reflected in Table V, which lists the SNR and Watsonmetric values for each estimate.

For any dataset, using the estimated channel KL transformwill produce results equal to, or better than, results obtainedusing the DFT for channel decorrelation. Achieving theseimproved results however, requires the added computations toestimate and calculate , and forces the estimator to bedata dependent. The DFT, though in general suboptimal, suffersfrom neither of these constraints and is well suited to efficientimplementation.

B. fMRI Simulation Results

To demonstrate the generality of the proposed estimationframework we consider an entirely different imaging modality,namely functional magnetic resonance imaging (fMRI). The

fMRI dataset13 contains 100 complete volumes of a subject’sbrain with 22 slices per volume and a spatial resolution of 128

128 in each slice, which we cast into amultichannel signal. An observation was generated by the addi-tion of noise with to yield an initial average channelSNR of 13.65 dB. This noisy observation was then estimatedusing the Soft, Soft-Overcomplete, BayesShrink, wiener2,and Independent Wavelet estimators. A typical channel ofthe dataset is shown in Fig. 8, in which it can easily be seenthat the incorporation of interchannel information into theestimation scheme results in substantially improved estimates.It is interesting to note that the Soft, Soft-Overcomplete, andBayesShrink estimators applied to the LunarLake dataset andthis dataset are identical except for the threshold level used,which is calculated from data-derived values.

13Provided through the courtesy of Dr. K. Thulborn, Center for Magnetic Res-onance Research at the University of Illinois at Chicago as funded by NIH grantPO1 NS35949.

Page 13: Asymptotically optimal blind estimation of multichannel images

1004 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

Fig. 6. Performance comparison, in average SNR gain per channel, of Soft and BayesShrink estimation schemes withU -Soft andU -BayesShrink estimationschemes, which estimated the channel coherence matrix from the observed data. Graph is of average SNR gain for all channels in a 256 � 256 � 224 subsetof the hyperspectral dataset LunarLake for various input SNR levels. Note that virtually identical performance is achieved regardless of whether the channel KLtransform is estimated using observed (noisy) or noise-free data.

Fig. 7. Channel 50 of the original, observed, and estimated LunarLake sub-dataset with � = 500 . As can be seen from the images, or corresponding the SNRand Watson metric values listed in Table V, the performance difference between using the DFT and estimating the channel KL transform is not substantial at thisnoise level. This is particularly true for the better performing BayesShrink estimator.

VII. CONCLUSION

Near-optimal estimation of arbitrary 2-D multichannel sig-nals is accomplished by the proposed wavelet-based framework.This scheme was designed to rival the multichannel Wiener filterin performance, but in addition, overcomes the issues that oftenrender the Wiener filter impractical for use. Computing the mul-tichannel Wiener filter requires second-order statistics of thedata being estimated, which results in this estimator depending

heavily on the data being estimated. The actual calculation of theWiener filter requires a matrix inversion, which for large multi-channel datasets is intractable. A final shortcoming of this op-timal linear estimator is that once calculated, it generally takes aform that is inefficient to implement. Our proposed frameworkovercomes all of these three limitations.

Analysis of the multichannel Wiener filter allowed for theisolation of the components that are responsible for the spa-

Page 14: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 1005

Fig. 8. Typical original, observed and estimated channels of the fMRI dataset with � = 400 . Virtually none of the fine details present in the original signalare recovered by the wiener2 and Independent Wavelet estimators. In contrast, estimators from the proposed framework, especially BayesShrink, reconstructvisually accurate estimates of the original data. The Soft-Overcomplete estimate is not shown as it is approximately visually equal to the Soft estimate.

TABLE VCOMPARISON OF SNR AND WATSON METRIC VALUES FOR CHANNEL 50 OF

THE OBSERVED AND ESTIMATED VERSIONS OF THE SUBSET HYPERSPECTRAL

DATASET LUNARLAKE WITH � = 500 . USING A DATA-DERRIVED ESTIMATE

OF THE CHANNEL KL TRANSFORM FOR CHANNEL DECORRELATION [U -SOFT,U -SOFT (OPTIMAL), U -BAYESSHRINK, AND U -BAYESSHRINK

(OPTIMAL)] RESULTS IN SLIGHTLY BETTER PERFORMANCE THAN

USING THE DFT. THE PERFORMANCE INCREASE HOWEVER, IS

RELATIVELY SMALL COMPARED TO THE DIFFERENCE

BETWEEN NOISY AND ESTIMATED

tial and channel decorrelation as well as the ideal coefficientweighting values. Approximating the optimal spatial decorrela-tion and coefficient weighting with a 2-D discrete wavelet trans-form and a wavelet-domain thresholding operation is central toour scheme’s ability to overcome the limitations of the mul-tichannel Wiener filter without sacrificing performance. Addi-tionally, use of the 2-D DWT permits efficient implementation

of our framework for both stationary and nonstationary signals.When channel coherence depends only on channel separationand not on absolute channel values, a blind estimator can berealized that utilizes only the FFT, 2-D DWT and a wavelet-do-main thresholding operation. This version of the proposed esti-mator further improves implementation efficiency and requiresno knowledge of the signal, channel type, channel coherence, orcorrupting noise, making it well suited to many applications. Al-ternatively, one may estimate the optimal channel decorrelatingtransform from the observed data. Doing so will improve theresulting estimate when the channel coherence does not dependsolely on channel separation, but results in an estimator that ingeneral cannot be implemented as efficiently (as using the DFT)and is directly dependent on the observed data.

The utility of the proposed framework in multiple imagingmodalities is apparent from its performance on hyperspectraland functional MRI datasets. Average channel SNR gains ofapproximately 16 dB were demonstrated for hyperspectral im-agery and individual channel gains of 12 dB were shown forthe fMRI dataset. This significantly surpasses the gains pro-vided by individual channel estimation schemes, which yieldSNR gains of roughly 7 dB in each case. In addition to an in-creased SNR, our estimator provides substantially improved vi-sual quality in all simulation examples. Contrary to the multi-channel Wiener filter, the same estimator can be used to achieve

Page 15: Asymptotically optimal blind estimation of multichannel images

1006 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 4, APRIL 2006

the gains for both the hyperspectral and functional MRI datasets,as all data-dependent parameters are extracted from the noisydata.

Since the proposed framework utilizes a 2-D DWT, estima-tion results can be improved using an overcomplete wavelet ex-pansion. SNR improvements of 1 to 1.5 dB were demonstratedfor a hyperspectral dataset using an overcomplete wavelet ex-pansion versus an orthonormal wavelet expansion. These gainswere fairly constant across various input SNR levels and numberof signal channels.

Using the BayesShrink adaptive wavelet threshold, ratherthan a simple soft threshold, to denoise the decorrelated coeffi-cients improved the estimate SNR of the hyperspectral datasetby more than 2 dB for low noise levels. This 2 dB is in additionto nearly 8.85-dB SNR gain achieved by the Soft Estimator.The additional gain over the soft threshold provided by theBayesShrink threshold falls off with increased noise level, butremains approximately constant regardless of the number ofchannels. It was argued that the BayesShrink adaptive thresholdis applicable to multichannel signal estimation provided thatthe detail subbands of the original signal obey a generalizedGaussian distribution.

Although it was not specifically explored in this paper, onewould expect that a combination of an overcomplete wavelet ex-pansion and an adaptive wavelet-domain threshold would yieldsuperior results than either alone. While this would require usingthe computationally costly overcomplete expansion, the poten-tially improved results may be worth these added costs. This isparticularly true in applications such as medical imaging whereestimation quality is of extreme importance.

REFERENCES

[1] B. R. Hunt and O. Kübler, “Karhunen–Loeve multispectral imagerestoration, part I: theory,” IEEE Trans. Acoust., Speech, SignalProcess., vol. ASSP-32, no. 3, pp. 592–600, Jun. 1984.

[2] N. P. Galatsanos, A. K. Katsaggelos, R. T. Chin, and A. D. Hillery,“Least squares restoration of multichannel images,” IEEE Trans. SignalProcess., vol. 39, no. 10, pp. 2222–2236, Oct. 1991.

[3] N. P. Galatsanos and R. T. Chin, “Digital restoration of multichannelimages,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 3,pp. 415–421, Mar. 1989.

[4] R. R. Schultz and R. L. Stevenson, “Stochastic modeling and estimationof multispectral image data,” in Proc. IEEE Int. Conf. Speech, Acoustics,Signal Processing, 1994, pp. V/373–V/367.

[5] C. Kotropoulos and I. Pitas, Nonlinear Model-Based Image/Video Pro-cessing and Analysis. New York: Wiley, 2001.

[6] A. Rao and D. Jones, “A denoising approach to multichannel signal esti-mation,” in Proc. IEEE Int. Conf. Speech, Acoustics, Signal Processing,vol. 5, 1999, pp. 2869–2872.

[7] , “A denoising approach to multisensor signal estimation,” IEEETrans. Signal Process., vol. 48, no. 5, pp. 1225–1234, May 2000.

[8] S. G. Mallat, A Wavelet Tour of Signal Processing. San Diego, CA:Academic, 1999.

[9] S. G. Chang, B. Yu, and M. Vetterli, “Spatially adaptive wavelet thresh-olding with context modeling for image denoising,” IEEE Trans. ImageProcess., vol. 9, no. 9, pp. 1522–1531, Sep. 2000.

[10] , “Adaptive wavelet thresholding for image denoising and compres-sion,” IEEE Trans. Image Process., vol. 9, no. 9, pp. 1532–1546, Sep.2000.

[11] D. Wei and A. C. Bovik, Wavelet Denoising for Image Enhance-ment. San Diego, CA: Academic, 2000, pp. 117–123.

[12] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf.Theory, vol. 41, no. 3, pp. 613–627, May 1995.

[13] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. UpperSaddle River: Prentice-Hall, 1995.

[14] S. G. Mallat, “A theory for multiresolutional signal decomposition: thewavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol.11, no. 7, pp. 674–693, Jul. 1989.

[15] D. L. Donoho, “Unconditional bases are optimal bases for data com-pression and for statistical estimation,” Appl. Comput. Harmon. Anal.,pp. 100–115, Dec. 1993.

[16] S. Sardy, “Minimax threshold for denoising complex signals,” IEEETrans. Signal Process., vol. 48, no. 4, pp. 1023–1028, Apr. 2000.

[17] I. Atkinson, “Wavelet-based near-optimal estimation of multichanneltwo-dimensional signals,” M.S. thesis, Univ. Illinois Urbana-Cham-paign, Urbana, 2003.

[18] M. Lang, H. Guo, J. Odegard, C. Burrus, and R. Wells, “Noise reductionusing an undecimated discrete wavelet transform,” IEEE Signal Process.Lett., vol. 3, no. 1, pp. 10–12, Jan. 1996.

[19] M. N. Do and M. Vetterli, “Wavelet-based texture retrieval using gen-eralized Gaussian density and Kullback-Leibler distance,” IEEE Trans.Image Process., vol. 11, no. 2, pp. 146–158, Feb. 2002.

[20] V. Barnett, “The ordering of multivariate data,” J. Roy. Statist. Soc. A,vol. 139, no. III, pp. 318–355, 1976.

[21] C. Kotropoulos and I. Pitas, “Multichannel L filters based on marginaldata ordering,” IEEE Trans. Signal Process., vol. 42, no. 10, pp.2581–2595, Oct. 1994.

[22] J. W. Brewer, “Kronecker products and matrix calculus in systemtheory,” IEEE Trans. Circuits Syst., vol. CAS-25, no. 9, pp. 772–781,Sep. 1978.

[23] H. Stark and J. Woods, Probability and Random Processes With Appli-cations to Signal Processing, 3rd ed. Upper Saddle River, NJ: Pren-tice-Hall, 2002.

[24] M. N. Wernick, E. J. Infusino, and M. Milosevic, “Fast spatio-temporalimage reconstruction for dynamic PET,” IEEE Trans. Med. Imag., vol.18, no. 3, pp. 185–195, Mar. 1999.

[25] J. S. Lim, Two-Dimensional Signal and Image Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1990.

[26] A. B. Watson, “Perceptual optimization of DCT color quantization ma-trices,” in Proc. IEEE Int. Conf. Image Processing, 1994, pp. 100–104.

Ian Atkinson (S’02) received the B.S. degree incomputer engineering from the Milwaukee Schoolof Engineering, Milwaukee, WI, in 2001, and theM.S.E.E. degree from the University of Illinois atUrbana-Champaign, Urbana, in 2003, where he iscurrently pursuing Ph.D. degree in the Electrical andComputer Engineering Department.

His research interests include multidimensionalsignal processing, statistical signal processing, signalestimation, and applications of wavelets.

Farzad Kamalabadi received the B.S. degree incomputer systems engineering from the Universityof Massachusetts, Amherst, in 1991, and the M.S.and Ph.D. degrees in electrical engineering fromBoston University, Boston, MA, in 1994 and 2000,respectively.

Since 2000, he has been an Assistant Professorin the Department of Electrical and Computer Engi-neering, University of Illinois at Urbana-Champaign,Urbana. In Fall 2002, he was a Visiting Fellow withSRI International, and in Summer 2003, he was a

National Aeronautics and Space Administration (NASA) Faculty Fellow withthe Jet Propulsion Laboratory, California Institute of Technology, Pasadena. Hisresearch interests include multidimensional and statistical signal processing;inverse problems in image formation, sensor array processing, and tomography;and the development of optical and radio sensing and imaging techniques andtheir applications to space physics, astronomy, and medicine.

Dr. Kamalabadi received a National Science Foundation (NSF) Fellowshipin 1994, a Best Student Paper Award in 1997, a NASA fellowship in 1998,and a NSF Faculty Early Career Development (CAREER) Award in 2002. Hehas served in various organizational capacities, including session organizer andchair for the International Union of Radio Science special sessions on iono-spheric imaging in 2003 and 2004.

Page 16: Asymptotically optimal blind estimation of multichannel images

ATKINSON et al.: ASYMPTOTICALLY OPTIMAL BLIND ESTIMATION 1007

Satish Mohan received the B.S. degree in computer engineering and the M.S.degree in electrical engineering from the University of Illinois at Urbana-Cham-paign (UIUC), Urbana, in 2001 and 2003, respectively, where he is currentlypursuing the Ph.D. degree in electrical engineering.

He is currently a Graduate Research Assistant with the Department of Elec-trical and Computer Engineering and the Beckman Institute, UIUC. His researchinterests including beamforming, direction-finding, and hearing aids.

Douglas L. Jones received the B.S.E.E., M.S.E.E.,and Ph.D. degrees from Rice University, Houston,TX, in 1983, 1985, and 1987, respectively.

During the 1987 to 1988 academic year, he waswith the University of Erlangen-Nuremberg, Ger-many, on a Fulbright postdoctoral fellowship. Since1988, he has been with the University of Illinois atUrbana-Champaign, Urbana, where he is currentlya Professor of electrical and computer engineeringin the Coordinated Science Laboratory and theBeckman Institute. He was on sabbatical leave from

the University of Washington, Seattle, in Spring 1995, and the University ofCalifornia, Berkeley, in Spring 2002. During the 1999 spring semester, heserved as the Texas Instruments Visiting Professor at Rice University, Houston,TX. He is an author of two DSP laboratory textbooks. His research interestsare in digital signal processing and communications, including nonstationarysignal analysis, adaptive processing, multisensor data processing, OFDM, andvarious applications such as advanced hearing aids.

Dr. Jones was selected as the 2003 Connexions Author of the Year. He servedon the Board of Governors of the IEEE Signal Processing Society from 2002 to2004.