4374 ieee journal of selected topics in …...band selection problem. the sparsity theory states...

4374 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 9, NO. 9, SEPTEMBER 2016

A Dissimilarity-Weighted Sparse Self-RepresentationMethod for Band Selection in Hyperspectral

Imagery ClassificationWeiwei Sun, Liangpei Zhang, Senior Member, IEEE, Lefei Zhang, Member, IEEE, and Yenming Mark Lai

Abstract—A new dissimilarity-weighted sparse self-representation (DWSSR) method has been presented to select aproper band subset for hyperspectral imagery (HSI) classification.The DWSSR assumes that all the bands can be represented by theselected band subset, and it formulates sparse representation ofall the bands into a sparse self-representation (SSR) model withrow-sparsity constraint in the coefficient matrix. Furthermore,the DWSSR integrates a dissimilarity-weighted regularizationterm with the SSR model to avoid the issue of too-close bandsencountered in the SSR. The regularization term explains theencoding cost of all bands with the representative bands, and anew composite dissimilarity measure which combines spectralinformation divergence with intraband correlation is implementedto estimate the encoding weight. The DWSSR program is solvedby the alternating direction method of multipliers (ADMM)framework, and the representative bands are finally selectedaccording to the norm rankings of nonzero rows in the estimatedcoefficient matrix. Five groups of experiments on three popularHSI datasets are designed to test the performance of DWSSR inband selection, and five state-of-the-art methods are utilized tomake comparisons. The results show that the DWSSR performsalmost best among all the six methods, either in computationaltime or classification accuracies.

Index Terms—Band selection, classification, dissimilarity-weighted sparse self-representation (DWSSR), hyperspectralimagery (HSI).

Manuscript received October 20, 2015; revised December 29, 2015; acceptedJanuary 20, 2016. Date of publication April 12, 2016; date of current versionSeptember 30, 2016. This work was supported in part by the National NaturalScience Foundation under Grants 41401389 and 91338111, in part by the 57thChinese Postdoctoral Science Foundation under Grant 2015M570668, in partby the Public Projects of Zhejiang Province under Grant 2016C33021, in partby Ningbo Social Science and Technology Project under Grant 2014C50067, inpart by by the Key Laboratory of Mapping from Space, National Admistrationof Surveying, Mapping and Geoinformation K201505, and in part by K. C.Wong Magna Fund in Ningbo University.

W. Sun is with the State Key Laboratory of Information Engineeringin Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan430079, China and with the College of Architectural Engineering, CivilEngineering and Environment, Ningbo University, Ningbo 315211, China(e-mail: [email protected]).

L. Zhang is with the State Key Laboratory of Information Engineering inSurveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079,China.

L. Zhang is with the Department of Computing, The Hong Kong PolytechnicUniversity, Kowloon, Hong Kong, China.

Y. M. Lai is with the Applied Mathematics, Statistics, and ScientificComputation, University of Maryland, College Park, College Park, MD 20742USA.

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSTARS.2016.2539981

I. INTRODUCTION

D URING the past few decades, hyperspectral imag-ing technique attracted much attention from worldwide

scholars and manufacturers, because of its powerful perfor-mance in collecting both spectral reflectance and images ofground objects on the earth surface [1]–[3]. The classificationmap of hyperspectral imagery (HSI) plays significant roles inthe environmental monitoring [4], [5], geological explorations[6], [7], precision farming [8], [9], and national defenses [10].However, numerous bands along with strong intraband cor-relations also bring about big problems for the classificationoperations [11]. Especially, the “Hughes” problem asks forextremely more training samples of ground objects to guaranteehigher classification accuracies, whereas collecting more train-ing samples is prohibited and time-consuming [12]. Therefore,making dimensionality reduction is an alternative choice toconquer the above problems [13].

Dimensionality reduction can be classified into two groups:1) band selection and 2) feature extraction [14], [15]. Bandselection is to select an appropriate subset from the originalband set of the HSI data, while feature extraction is to trans-form spectral vectors into a low-dimensional featured space andto preserve significant spectral information. In this paper, wefocus our study on band selection, because we regard that theselected band subset inherits original spectral explanations fromthe HSI data when compared against feature extraction.

Generally, scholars select an appropriate band subset usingtwo main schemes: the maximum information or minimumcorrelation (MIMC) scheme and the maximum interclass sep-arability (MIS) scheme [3], [16]. MIMC chooses an appropri-ate band subset with three main criteria of entropy criterion,intraband correlation criterion, and the cluster criterion. Theentropy criterion works by maximizing the overall amountof information using entropy-like measurements [17], [18].The band subset from intraband correlation criterion has min-imal intraband redundancy. Examples include the joint band-prioritization and band-decorrelation method [19], the con-strained energy minimization (CEM)-based method [20], andthe maximum discrimination and information-based semisuper-vised algorithm [21]. The cluster criterion considers intrabandcorrelations and achieves a band subset through band cluster-ing. Examples include the hierarchical clustering-based method[22], the ranking-based clustering method [23], and the columnsubset selection-based method [24].

The MIS scheme maximizes the separability in differ-ent classes of ground objects using the following criteria:

1939-1404 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

SUN et al.: DWSSR METHOD FOR BAND SELECTION IN HSI CLASSIFICATION 4375

the distance measurement criterion, the feature transforma-tion criterion, and the realistic application criterion. The dis-tance measurement criterion implements with a distance-likemeasurement including the Euclidean distance (ED), spectralinformation divergence (SID), and Mahalanobis distance (MD)[25], [26]. The feature transformation criterion maximizes theintraclass separability of ground objects in a transformed low-dimensional featured space. Examples include particle swarmoptimization-based algorithm [27], the complex network algo-rithm [28], and the evolutionary multiobjective optimizationalgorithm [29]. The realistic application criterion aims for opti-mizing the defined objective function of certain applications,and typical examples are the progressive method for spec-tral unmixing [30], the dominant set extraction method [31],and the minimum estimated abundance covariance method forvisualization [32].

With the recent popularity of compressive sensing, thesparsity-based methods have been proposed to investigate theband selection problem. The sparsity theory states that eachband can be sparsely represented using only a few nonzerocoefficients in a suitable basis or dictionary [33], [34]. Sparserepresentation of a band vector reveals certain underlyingstructures within the HSI data and drastically lowers the com-putational burden in HSI data processing [35]. The sparserepresentation-based method selects the most frequent bands inthe histogram of the sparse coefficient matrix to constitute theaimed band subset [36]. The sparse nonnegative matrix factor-ization (SNMF)-based methods factorize the HSI data matrixinto a dictionary matrix and a sparse coefficient matrix, andclustering on the columns of coefficient matrix helps to esti-mate a proper band subset [37], [38]. The collaborative sparsemodel refines the preselected bands from the combination ofNFINDER and linear prediction [39]. The sparse support vec-tor machine (SVM) method picks important bands using aclear gap between zero and nonzero weights from the model[40]. The improved sparse subspace clustering (ISSC) methodfinds a band subset using spectral clustering on the similar-ity matrix constituted with sparse coefficient vectors [41]. Thesparse CEM method regularizes the regular CEM operator witha sparse constraint and solves a convex quadratic program-ming problem to select the important bands [42]. Other moresparsity-based algorithms include the least absolute shrinkageand selection operator-based method [43] and the discrimina-tive sparse multimodal learning-based method [44].

In this paper, we present a dissimilarity-weighted sparse self-representation (DWSSR) method to solve the problem of bandselection. In particular, our motivation is to present the sparseself-representation (SSR) model of HSI band vectors, to regu-larize the SSR model with the dissimilarity regularization termthat probes the dissimilarity among all band vectors and toimplement the integrated DWSSR model into the band selectionproblem. Compared with the current sparsity-based methods,our DWSSR method favors three main contributions as follows.

First, the DWSSR improves from the SSR model and it hasmore flexible assumption in the HSI dataset than many cur-rent methods such as the SNMF and ISSC. The DWSSR onlyrequires that the size of the selected band subset to be muchsmaller than the number of HSI bands, and does not enforce the

HSI dataset to be low rank or sampled from several underlyingindependent subspaces.

Second, the DWSSR proposes a composite dissimilaritymeasurement (CDM) to measure the dissimilarity betweenpairwise bands and integrates the dissimilarity-weighted reg-ularization term with the SSR model. The presented CDMquantifies the divergences in both the information amount andthe intraband correlations among all bands, and it makes signif-icantly positive effects on the performance of DWSSR for bandselection.

Finally, the DWSSR optimizes the sparse coefficient matrixby solving a nonlinear optimization problem via the alternat-ing direction method of multipliers (ADMM) framework anddirectly estimates the exemplar bands from nonzero row vectorsof the sparse coefficient matrix. The DWSSR does not involveany other complex procedures such as clustering or frequency-counting in the histogram on the coefficient matrix that areessential in many other methods, and therefore, it is easy toimplement and is computationally effective.

This paper is organized as follows. Section II briefly reviewsthe theory of SSR model. Section III presents the band selec-tion method using the DWSSR model. Section IV analyzes theperformance of DWSSR for classification on three widely usedHSI datasets. Section V states conclusion and outlines of ourfuture work.

II. BRIEF REVIEW OF SSR MODEL

In this section, we briefly review the theory of SSR model.Consider a high-dimensional dataset as a collection of vectorswithout noises Y = {yj}Nj=1 ∈ RD×N , where D is the numberof data points and N is the dimensionality of high-dimensionalfeature space with N << D; the SSR model explains that thedata matrix Y can be represented by itself with a sparse coef-ficient matrix. The formulation of SSR is defined as follows[45]:

YD×N

= YD×N

ZN×N

, s.t. |S| ≤ k, diag (Z) = 0 (1)

where Z = {zj}Nj=1 is the sparse coefficient matrix, |S| =supp(Z) =

{1 ≤ i ≤ N : zi �= 0

}, where zi is the ith row vec-

tor of Z, and diag(Z) is the matrix constituted with diagonalentries in Z. The support constraint |S| ≤ k restricts the largestnumber of rows containing nonzero entries in Z. The constraintdiag (Z) = 0 is to eliminate a trivial solution that each datapoint is simply a linear combination of itself. Each pairwisecolumn yj and zj from Y and Z in (1) formulates a singlemeasurement vector problem with yj = Yzj , and it interpretsthat each data point yj can be successfully reconstructed from asparse coefficient vector zj using a common dictionary Y. Thesolution of Z in (1) is usually transformed into minimizing thefollowing objection function:

argmin‖Z‖q = min

∥∥∥∥∥∥⎛⎝ N∑

j=1

zij

⎞⎠

i∈Ω

∥∥∥∥∥∥q

, Ω = {1, 2, . . . , N} ,

s.t. Y = YZ, diag(Z) = 0 (2)


where ‖Z‖q represents the lq-norm of sparse matrix Z defined

as ‖Z‖q =

∥∥∥∥∥(

N∑j=1

zij

)i∈Ω

∥∥∥∥∥q

, with the available choices of

q = 0, 1, 2 or mixed forms between 0 and 2. When havingq = 0, the resulting l0-norm optimization counts the number ofnonzero rows in Z. Considering the high computational com-plexity and nonconvexity in l0-norm optimization, the problemis always relaxed into alternative convex programming prob-lems using the l1, l1,2, and l2-norms, and nonlinear optimizationalgorithms are accordingly utilized to help estimating the opti-mal solution Z. Typical examples are the interior point methodusing the l1-norm [46], the spectral projected gradient algo-rithm using the l1,2-norm [47], the generalized subspace pursuitusing the l2-norm [48], and the sparse randomized Kaczmarzalgorithm using the l2-norm [49].

III. PROPOSED DWSSR METHOD FOR BAND SELECTION

In this section, the proposed band selection method usingDWSSR is described. Section III-A presents SSR of all bandvectors. Section III-B describes the dissimilarity-weighted reg-ulation term using the CDM between pairwise band vectors.Section III-C explains the model of DWSSR and its correspond-ing solution. Section III-D summarizes the procedure of ourproposed DWSSR for band selection.

A. SSR Model for All Band Vectors

Assume the HSI dataset as a collection of band vectorsY = {yj}Nj=1 ∈ RD×N , where D is the dimension of high-dimensional feature space and is equal to the number of pixelsin the image scene; N is the number of bands with N << D.In hyperspectral field, selecting a band subset can be regardedas finding representatives or exemplars from the original bandset, such that each band can be described as a linear combina-tion of a few of the representatives. Accordingly, the problemformulates a SSR model that the band matrix Y is sparselyrepresented by itself with a coefficient matrix Z that has fewrows containing nonzero entries. The collection of representa-tives YS = {yt}kt=1 ∈ RD×k that corresponds to the indicatorsof nonzero rows in Z constitutes the aimed band subset, wherek is the size of band subset with k << N . Considering the factthat band vectors are contaminated with Gaussian noises, wehave the SSR model for all band vectors shown in the followingequation:

Y = YZ+E, s.t. |S| ≤ k,

Z ≥ 0, diag(Z) = 0,

N∑i=1

zij = 1 ∀j (3)

where Z ∈ RN×N and E ∈ RD×N are the sparse coefficientmatrix and error term of all band vectors, respectively. Theerror matrix E results from approximation errors in the repre-sentation by the representative bands and Gaussian noises in allband vectors. The support operation |S|= supp(Z) = ‖Z‖0 =∑N

i=1 ind(∥∥zi∥∥

0> 0) counts the number of rows containing

nonzero entries in the coefficient matrix Z, where zi repre-sents an arbitrary row vector of Z and ind (·) is the indicatorfunction. The constraint diag (Z) = 0 is to eliminate a trivialsolution that each band is simply a representation of itself. Thenon-negative constraint Z ≥ 0 is to ensure that each nonzeroentry in zj represents the probability of the representative bandsin reconstructing an arbitrary jth band vector. The sum-to-one

constraintN∑i=1

zij = 1 is to guarantee the probability that an

arbitrary jth band reconstructed by other bands is equal to one,and it also guarantees that the selected bands are invariant withrespect to a global translation of the HSI data. Moreover, thenonnegative and sum-to-one constraints give a benefit that thegeometry of the representative bands corresponds to the verticesof convex hull of the band dataset (i.e., one band is regarded asa high-dimensional point in the band dataset).

Considering the high computational complexity of supp(Z)or ‖Z‖0, the solution of (3) can be relaxed into optimizing themixed norm ‖Z‖1,2, shown in the following equation:

argminZ

‖Z‖1,2, s.t. Y = YZ+E,Z ≥ 0,

diag(Z) = 0, 1TZ = 1T (4)

where ‖Z‖1,2 =∑N

i=1

∥∥zi∥∥2

is the sum of l2-norm for coeffi-cient vector zi in all rows and it promotes the sparsity among

rows of Z; the constraint 1TZ =1T is equivalent toN∑i=1

zij= 1,

where 1 ∈ RN×1 is a column vector with all entries equal toone. Furthermore, using Lagrange multiplier, the program (4)can be written as follows:

argminZ

λ‖Z‖1,2 +1

2‖Y −YZ‖2F , s.t., Z ≥ 0,

diag(Z) = 0, 1TZ = 1T (5)

where λ > 0 is the regularization parameter to control the effectfrom the sparsity of the coefficient matrix and ‖·‖F is theFrobenious norm.

B. Dissimilarity-Weighted Regularization Model of All BandVectors

In the hyperspectral field, the dissimilarity between pairwisebands shows the approximation of each other, and a smallerdissimilarity value denotes a higher probability that one canbe a representative of another when applied in selecting bands.The purpose of band selection is to determine an informativebut distinctive band subset, where each representative band haslarger information amount and smaller intraband correlationswith others. Therefore, the dissimilarity information amongall bands that integrates the information divergence and intra-band correlation is significant in selecting the representativebands. In the above section, (4) obtains an optimization solutionby minimizing both the mixed norm ‖Z‖1,2 and approxima-tion error, and nonzero row vectors in Z indicate the exemplarbands. However, the global optimization scheme in (5) hasno specific constraint in the too-close bands. Accordingly, itmay lead to the fact that the estimated representative bands


Fig. 1. Image of coefficient matrix from the formulation (5) on the Indian Pinesdataset.

Fig. 2. Plots of three adjacent row vectors from sparse coefficient matrix on theIndian Pines dataset.

are overconcentrated in certain similar bands. Take the Indianpines dataset in Section IV-A as an example. Fig. 1 showsthe image of an estimated sparse coefficient matrix Z fromthe optimization program in (5), where the black backgrounddenotes zero row vectors in the matrix and the white corre-sponds to nonzero row vectors. This figure shows that threeadjacent bands in 189–191 are chosen as candidates of the bandsubset. Furthermore, the coincidence in three plots of corre-sponding row vectors of bands 189–191 in Fig. 2 illustrates thatthe three adjacent bands have almost the same capacity in rep-resenting all the other bands. However, the three adjacent bandscollect highly similar spectral responses of ground objects, hav-ing the spectrum wavelength within 2375.58–2405.33 nm. Inthis case, this would bring about information redundancy in theselected representative bands and it is contrary to the intuitivepurpose of band selection. Therefore, the dissimilarity informa-tion among all bands should be added to the above function (5)to prune the too-close bands and to improve the performance ofrepresentative bands.

Many dissimilarity measurements are available to quan-tify the differences between pairwise bands, such as the SID,the reciprocal of correlation coefficient (CC) (i.e., 1/CC), thesquare of SID (SID2), spectral angle distance (SAD), ED, andMD. In some sense, the selection of dissimilarity measurementis empirical or ad hoc. In this paper, we present a CDM toformulate a dissimilarity-weighted regularization model and to

quantify the encoding cost of all bands with the representa-tive bands. The first reason of implementing CDM is that ourtrial results show that the DWSSR with CDM behaves betterin estimating a proper band subset than other measures listedabove. Moreover, we regard that the CDM integrates the infor-mation divergence with intraband correlation and could betterrepresent the dissimilarity information between two bands. TheSID is implemented to measure the differences in the infor-mation amount between pairwise bands. We use SID ratherthan SAM, ED, or MD, because we argue that SID origi-nates from the concept of divergence arising in informationtheory and it is widely used in characterizing the statistics ofa band vector. The advantage of SID in information theoreticcriterion is more prominent than any deterministic metric inSAM, ED, or MD [50]. The SID considers each band vectoras a random variable and quantifies the discrepancy of prob-abilistic behaviors between pairwise band vectors. Meanwhile,considering its simple and effective advantage, the CC is imple-mented to measure the correlations between pairwise bandvectors. For two band vectors yi = (ai1, . . . , ail, . . . , aiN )T

∈ RN×1 and yj = (aj1, . . . , ajl, . . . , ajN )T ∈ RN×1, whereail and ajl are spectral responses of pixel yl in the ith andjth band images, respectively, the CDM is defined in thefollowing:

CDMij =SID(yi, yj)

R(yi, yj)(6)

SID(yi, yj) = KLD(yi‖ yj) +KLD(yj‖ yi) (7)

R(yi, yj) =

N∑l=1

(ail − yi)(ajl − yj)√N∑l=1

(ail − yi)2

N∑l=1

(ajl − yj)2

(8)

where KLD(yi‖ yj) =N∑l=1

ail log(ail/ ajl) is the relative

entropy of yi with respect to yj and is also called Kullack–Leibler divergence, in which ail is the normalized spectral

response between 0 and 1 using equation ail = ail

/N∑l=1

ail; yi

and yj are the means of the ith and jth band vectors yi and yj ,respectively. In (6), a higher CDM infers that pairwise bandshave larger information divergence and smaller correlations,and thus could better represent the dissimilarity between thetwo bands.

In the coefficient matrix Z, each entry zij can be regarded asthe probability that the band yi is chosen as a representative ofyj . Weighted by the dissimilarity measure, the cost of encod-ing yj with yi is CDMijzij , and the total encoding cost of

yj with all its representatives isN∑i=1

CDMijzij . Furthermore,

the dissimilarity-weighted regularization term is constructedby minimizing the total cost of encoding the band set Ywith the representative bands using the following optimizationfunction:

argmin ‖D� Z‖ = arg min tr(DTZ

)(9)


TABLE IPROCEDURE OF SOLVING THE SPARSE COEFFICIENT MATRIX WITH ADMM FRAMEWORK

where D = {CDMij} εRN×N is the dissimilarity-weightedmatrix, and � is the component-wise product (i.e., Hadamardproduct) between components in two variables. The CDN infunction (9) not only weights the dissimilarity in sparse coeffi-cients between pairwise bands but also enhances the sparsity ofcoefficient matrix Z. The predefined dissimilarity measurementis independent of formulating the optimization program (9), andinterested readers can try other dissimilarity measurements inthe program.

C. Solution of Proposed DWSSR Model

Combining the dissimilarity-weighted regularization term(9) with the SSR model (5), the new Lagrange multipliers’formulation is obtained in the following equation:

argminZ

λ‖Z‖1,2 + μtr(DTZ) +1

2‖Y −YZ‖2F ,

s.t. Z ≥ 0, diag(Z) = 0, 1TZ = 1T (10)

where λ > 0 and μ > 0 are the regularization parameters thatbalance the effects from the sparsity of the coefficient matrixand the dissimilarity-weighted regularization term, respec-tively. In this paper, we adopt the ADMM framework [51] tosolve the above optimization problem (10). The idea of ADMMis to introduce proper auxiliary variables to enhance the con-straints in objection function (10), which iteratively minimizethe Lagrangian with respect to the primal variables and max-imize it with respect to the Lagrange multipliers [52]. Table Ishows the procedure of implementing the ADMM frameworkto obtain an optimal result of sparse coefficient matrix Z.

We first introduce an auxiliary matrix A = Z− diag(Z) tohelp obtaining efficient updates on the optimization variables,and the function (10) is transformed into the followingequation:

argminZ,A

λ‖Z‖1,2 + μtr(DTA) +1

2‖Y −YA‖2F ,

s.t. 1TA = 1T, A = Z− diag(Z), Z ≥ 0. (11)

After that, two penalty terms 1TA = 1T and A = Z−diag(Z) are added into (11) using a defined parameter ρ > 0,and the program (11) is accordingly transformed into thefollowing equation:

arg minZ,A

λ‖Z‖1,2 + μtr(DTA) +1

2‖Y −YA‖2F

+ρ

2

∥∥AT1− 1∥∥22+

ρ

2‖A− (Z− diag(Z))‖22 ,

s.t. 1TA = 1T, A = Z− diag(Z), Z ≥ 0. (12)

The two penalty terms enhance the convex property of theobjection function and do not change its optimal solution [52].Furthermore, an auxiliary vector δ ∈ RN×1 and an auxiliarymatrix Λ ∈ RN×N of Lagrange multipliers are adopted intothe two equality constraints in (12), and the Lagrange functionis rewritten as follows [53]:

L(Z,A, δ,Λ) = λ‖Z‖1,2 + μtr(DTA) +1

2‖Y −YA‖22

+ρ

2

∥∥AT1− 1∥∥22+

ρ

2‖A− (Z− diag(Z))‖22

+ δT (AT1− 1) + tr(ΛT(A− Z+ diag(Z))). (13)


The ADMM algorithm optimizes the Z, A, δ, and Λ withiterative procedures and updates each variables at iteration n+1using the following schemes. The derivative of L(Z,A, δ,Λ)with respect to A is computed by fixing the Z(n), δ(n), and Λ(n)

at the nth iteration. Setting the derivative as zero, the A(n+1) isthen obtained by solving the following linear equation:

(YTY + ρI+ ρ11T)A(n+1) = YTY

+ ρ(11T + Z(n))− 1δ(n)T −Λ(n) − μD. (14)

On the other hand, the function (13) can be transformed intothe following equation:

L(Z,A, δ,Λ) = λ‖Z‖1,2 +ρ

2

∥∥∥∥(A− (Z− diag(Z))) +Λ

ρ

∥∥∥∥2

2

+ μtr(DTA)

+1

2‖Y −YA‖2F +

ρ

2

∥∥AT1− 1∥∥22

+ δT(AT1− 1)− 1

2ρ‖Λ‖22 (15)

= λ‖Z‖1,2 +ρ

2

∥∥∥∥(Z− (A+Λ

ρ+ diag(Z)))

∥∥∥∥2

2

+ h(A, δ,Λ)

where h(A, δ,Λ)=μtr(DTA)+ 12 ‖Y−YA‖2F +ρ

2

∥∥AT1−1∥∥22

+ δT(AT1− 1)− 12ρ‖Λ‖22 only changes with variables A, δ,

and Λ, and it does not depend on variable Z. Therefore, theZ(n+1) can be updated by optimizing the following simplifiedobjection function:

Z(n+1) = argminZ

λ‖Z‖1,2 (16)

+ρ

2

∥∥∥∥(Z− (A+Λ

ρ+ diag(Z)))

∥∥∥∥2

2

, s.t. Z ≥ 0.

Furthermore, having Z(n+1) and A(n+1) fixed, δ(n+1) andΛ(n+1) are iteratively updated using a gradient ascent schemewith a step size of ρ on the Lagrange multipliers as follows:

δ(n+1) = δ(n) + ρ(A(n+1)T1− 1) (17)

Λ(n+1) = Λ(n) + ρ(A(n+1) − Z(n+1)). (18)

The above iterations for A(n+1), Z(n+1), δ(n+1), andΛ(n+1) are repeated until satisfying the convergence con-dition or the number of iterations exceeds the predefinedmaximal iteration number. The convergence condition is set

as∥∥∥A(n+1)T1− 1

∥∥∥∞

≤ ε,∥∥A(n+1) − Z(n+1)

∥∥∞ ≤ ε, and∥∥A(n+1) −A(n)

∥∥∞ ≤ ε, where ε is the defined error tolerance

for the primal and dual residuals. The choice of convergencethreshold ε affects the minimization of program (10) andimpacts the optimization result of sparse coefficient matrix Z.The matrix Z(n+1) at the stopping iteration is estimated as theoptimal sparse coefficient matrix Z.

Within the above-estimated sparse coefficient matrix, thenorm rankings in nonzero row vectors illustrate that the cor-responding bands have divergent capacities in representing the

Fig. 3. Sketch map of band selection with DWSSR.

Fig. 4. Image of Indian Pines dataset.

other bands. An exemplar band with lower ranking makes lesseffect on the representation of other bands in the HSI data, andhence, its corresponding row vector in the coefficient matrix hasa few nonzero elements with smaller values. Thus, we rank therepresentative bands in the descending order using the l1-normof their nonzero row vectors and select the first k bands thathave higher rankings to constitute the final selected subset YS.

D. Summary of Band Selection Using DWSSR

Our DWSSR method stands on SSR model of all the bandvectors and aims to select exemplar bands that correspondto nonzero rows in the coefficient matrix. Considering theproblem of too-close bands from the regular SSR formula-tion, we integrate the SSR model with a proposed dissimi-larity regularization term to formulate the model of DWSSR.The dissimilarity-weighted regularization term minimizes theencoding cost of all the bands with representative bands, and anew CDM measure that combines the information divergencewith intraband correlations is utilized to measure the weight in


TABLE IIGROUND TRUTH OF TRAINING AND TESTING SAMPLES IN EACH CLASS FOR INDIAN PINES DATASET

the encoding procedure. The DWSSR model is transformed intoa Lagrange multipliers formulation, and the program is opti-mized with the ADMM algorithm. The representative bandswhose corresponding nonzero rows in the coefficient matrixhave higher norm rankings constitute the final band subset.The sketch map of the band selection using DWSSR is shownin Fig. 3.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

In this section, we implement three famous HSI datasetsincluding Indian Pines, Urban, and Pavia University (PaviaU) totestify the band-selection performance of our proposed DWSRmethod when applied in classification. Section IV-A describesthe relevant information of the three HSI datasets. Section IV-Blists the detailed results from five groups of experiments, andSection IV-C analyzes and discusses the experimental resultsfrom Section IV-B.

A. Descriptions of Three HSI Datasets

The Indian Pines dataset was from the MultispectralImage Data Analysis System group at Purdue University(https://engineering.purdue.edu/~biehl/MultiSpec/aviris_documentation.html) [54]. The dataset was acquired by NASAon June 12, 1992, using the AVIRIS sensor from JPL. Ithas 20 m spatial resolutions and 10 nm spectral resolutionscovering a spectrum range of 200–2400 nm. A subset ofthe image scene of size 145 × 145 pixels is implementedin our experiment, and it covers an area of 6 miles west ofWest Lafayette, Indiana. The dataset was preprocessed withradiometric corrections and bad band removal, and 200 bandswere left with calibrated data values proportional to radiances.Sixteen classes of ground objects exist in the image scene ofFig. 4, and the ground truth for training and testing samples ineach class is listed in Table II.

The Urban dataset was acquired from the website of USArmy Geospatial Center (http://www.erdc.usace.army.mil/Media/FactSheets/FactSheetArticleView/tabid/9254/Article/610433/hypercube.aspx) [55]. It was collected by a HYDICEsensor having 10-nm spectral resolution and 2-m spatialresolutions. The low signal-to-noise ratio (SNR) bands [1–4,76, 87, 101–111, 136–153, 198–210] were removed from theinitial 210 bands, with the final 162 bands left. A smaller imagesubset of size 307× 307pixels was selected from a larger

Fig. 5. Image of Urban dataset.

image and it covers an area at Copperas Cove near Fort Hood,TX. Fig. 5 has 22 classes of ground objects in the image scene,and Table III shows the ground-truth information for trainingand testing samples in each class.

The PaviaU dataset was taken from the ComputationalIntelligence Group in the Basque University (http://www.ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes). It was obtained from ROSIS sensor having 1.3 mspatial resolutions and 115 bands. After removing low SNRbands, 103 bands were implemented in our experiments. Thesmaller subset of the larger dataset shown in Fig. 6 contains350× 340pixels and covers the area of Pavia University.The image scene has nine classes of ground objects includingshadows, and the ground truth information of training andtesting samples is listed in Table IV.

B. Experimental Results

In the following, five groups of experiments on three HSIdatasets are conducted to test our DWSSR method in select-ing a proper band subset for classification. We utilize fivestate-of-the-art methods to make holistic comparisons with ourmethods, including maximum-variance principal componentanalysis (MVPCA) [19], sparse-based band selection (SpaBS)method [36], SNMF [38], ISSC [41], and SSR [45] methods.First, we quantify the band-selection performance of DWSSRand compare the results with those of SSR. The experimentestimates the impact from the dissimilarity regularization term


TABLE IIIGROUND TRUTH OF TRAINING AND TESTING SAMPLES IN EACH CLASS FOR URBAN DATASET

Fig. 6. Image of PaviaU dataset.

in promoting the performance of regular SSR in band selec-tion for classification. Second, we compare the classificationaccuracies of DWSSR against those of the five other meth-ods including MVPCA, SpaBS, SNMF, ISSC, and SSR. Twopopular classifiers are utilized in the experiment, SVM [56]and K-nearest neighbor (KNN) [57]. The overall classificationaccuracy (OCA) and average classification accuracy (ACA)are implemented to measure the classification performanceof all the above methods. The SVM classifier works withthe radial basis function (RBF) kernel function, where thevariance parameter and penalization factor are estimated viacross-validation, and the ED is used in the KNN classifier. Foreach dataset, we repeated the training and testing samples tentimes for the subsample. Third, we compare the computationalcomplexity and computational time of DWSSR against fiveother methods by varying the number of selected bands k.The experiment evaluates the computational performance ofDWSSR. Finally, we investigate the relations between two reg-ularization parameters λ and µ in DWSSR and classificationaccuracies of the selected bands. The two experiments help todetermine proper parameters when applying DWSSR in solv-ing realistic applications of HSI classification. The followingexperimental results without specific notations are the averageresults of ten different and independent experiments.

1) Effect From the Dissimilarity-Weighted RegularizationTerm on DWSSR for Band Selection: The experiment esti-mates the impacts of the dissimilarity regularization term inimproving the band-selection performance of the regular SSRmethod in classification. The SSR method estimates the sparsecoefficient matrix from the program (5), makes clusters on rowvectors of the coefficient matrix, and finally, selects the repre-sentative band from each cluster. Three quantitative measuresincluding the average information entropy (AIE), the averageCC (ACC), and the average relative entropy (ARE) are imple-mented to estimate the richness of spectrum information, theintraband correlations, and the intraclass separabilities of theselected band subset, respectively. The reason for implementingAIE, ACC, and ARE is that we regard that a proper band subsethas higher information amount, low intraband correlations, andhigher intraclass separabilities. Meanwhile, we compare theACAs and OCAs of DWSSR with those of SSR using the KNNand SVM classifiers. The neighbor size in the KNN classifieris manually set as 3, and the threshold of total distortion in theSVM classifier is set as 0.01. In the experiment, the proper sizeof band subset k is manually estimated and is then set as the sizeof band subsets from both methods. The parameters k in IndianPines, Urban, and PaviaU datasets are 12, 20, and 10, respec-tively. Using cross-validation, the regularization parameters λand μ in Indian Pines are 0.3 and 0.05, respectively; the λ andμ in Urban are 0.6 and 0.01, respectively; and the λ and μ inPaviaU are 1.1 and 0.1, respectively. The regularization param-eters λ1 in SSR on Indian Pines, Urban, and PaviaU datasets areset as 100, 85, and 90 via cross-validation, respectively. Table Vlists detailed information about the parameters of both methodson the three HSI datasets.

Table VI illustrates quantitative evaluation results, theselected band subsets, and classification accuracies of both SSRand DWSSR methods, where the best results are highlightedin bold and underline fonts. For the Indian Pines dataset, theARE of DWSSR is over twice that of SSR, whereas the ACCis less than one half. Accordingly, the DWSSR has higher AIEand ARE but lower ACC than SSR. Furthermore, evaluationresults of Urban and PaviaU datasets support the observationsabove. On the other hand, for all the three HSI datasets, the


TABLE IVGROUND TRUTH OF TRAINING AND TESTING SAMPLES IN EACH CLASS FOR PAVIAU DATASET

TABLE VLISTS OF PARAMETERS IN ALL THE EXPERIMENTS ON THE THREE HSI DATASETS

TABLE VICONTRAST IN QUANTITATIVE EVALUATIONS AND CLASSIFICATION ACCURACIES BETWEEN SSR AND DWSSR METHODS

classification accuracies of DWSSR clearly outperform thoseof SSR. Especially, the OCAs and ACAs of DWSSR on IndianPines behave over 8% higher than those of SSR, in eitherSVM or KNN classifier. Therefore, the DWSSR behaves bet-ter than SSR in both classification accuracies and quantitativeevaluations.

2) Classification Performance of Our DWSSR Method:This experiment estimates the classification performance of theDWSSR band subset. Our purpose is to make holistic eval-uations in classification performance by varying the size ofband subset k and the proportions of training samples rather

than using a certain predefined band number or training sam-ple size. In the experiment, the iteration time t for the learningdictionary in SpaBS is manually set as 5 for all three datasets.Using cross-validation, the α and γ in SNMF of PaviaU datasetare chosen as 3.0 and 0.1, respectively; the α and γ in Urbandataset are chosen as 3.5 and 0.05, respectively; and the α andγ in Urban dataset are chosen as 4.0 and 1.5, respectively. Theregularization parameters β in ISSC of Indian Pines, Urban, andPaviaU datasets are determined as 0.1, 0.05, and 0.001, respec-tively, via cross-validation. Other parameters unmentioned arethe same as their counterparts in the previous experiment 1).


Fig. 7. OCA curves of all the six methods on the three HSI datasets with the changing sizes of band subset. (a), (c), and (e) SVM. (b), (d), and (f) KNN.

Fig. 8. OCA curves of all the six methods on the three datasets with the changing percentages of training samples. (a), (c), and (e) SVM. (b), (d), and (f) KNN.

Table V details the parameter configurations of all the sixmethods involved.

Fig. 7 plots the OCA results of all the six methods on thethree datasets by changing the size of band subsets. The sizeof band subset k in Indian Pines dataset changes from 2 to 44with a step interval of 2. The size of band subset k in Urbandataset varies between 2 and 46 with a step interval of 2. Thek in PaviaU dataset changes from 2 and 50 with a step intervalof 2. We did not list the ACA results because of their similar-ity with the OCA results on the same dataset. Among all the

OCA curves from both classifiers, the MVPCA and SpaBS per-form worse than other four methods, especially that the SpaBScurves are most unstable. The SSR curves are similar with theSNMF curves, whereas their behaviors are inferior to those ofthe ISSC and DWSSR. The DWSSR curves achieve better orcomparable OCAs to the ISSC method, while it is superior tothose of other four methods MVPCA, SSR, SNMF, and SpaBS.

Fig. 8 shows the OCA results of all the six methods bychanging the percentages of training samples per class listedin Tables II–IV. The range of proportions of training samples in


Fig. 9. SVM classification map of all the six methods on the Indian Pines dataset. (a) Ground truth. (b) MVPCA. (c) SpaBS. (d) SNMF. (e) ISSC. (f) DWSSR.(g) SSR.

Fig. 10. SVM classification map of all the six methods on the Urban dataset. (a) Ground truth. (b) MVPCA. (c) SpaBS. (d) SNMF. (e) ISSC. (f) DWSSR.(g) SSR.

Fig. 11. SVM classification map of all the six methods on the PaviaU dataset. (a) Ground truth. (b) MVPCA. (c) SpaBS. (d) SNMF. (e) ISSC. (f) DWSSR.(g) SSR.

TABLE VIICONTRAST IN COMPUTATIONAL COMPLEXITY BETWEEN DWSSR AND FIVE OTHER BAND SELECTION METHODS

the total set of ground truth is set as [0.05, 0.1, 0.15, 0.2, 0.25,0.3, 0.4, 0.5]. The observations in this figure tell that the OCAcurves of DWSSR perform better than all the other five meth-ods, and the SpaBS curves behave worst among all the methods.Meanwhile, Figs. 9–11 illustrate the classification maps of allthe six methods on the three datasets using SVM. The sizes ofband subsets on the Indian Pines, Urban, and PaviaU datasetsare manually chosen to be 30, 40, and 20. The reason we foundfrom Fig. 7 is that a band subset size that is about twice thenumber of main ground objects could guarantee better classifi-cation accuracies for all the methods than their counterparts inexperiment 1).

3) Computational Performance of Our DWSSR: The exper-iment estimates computational complexity and computationalspeeds of DWSSR when varying the size of band subset k. Forthe DWSSR method, the computational complexity of updat-ing the auxiliary matrix A is O(N3),and the computationalcomplexities of updating the coefficient matrix Z, the auxil-iary matrix Λ, and auxiliary vector δ are O(ND). Therefore,the total complexity of DWSSR method is O(N3n+3NDn), inwhich n is the iteration number in ADMM, and N and D aredimensions of band vectors and number of bands, respectively.Table VII compares the computational complexity of DWSSR

and other five methods, where K denotes the sparsity level ofSpaBS and k is the number of selected bands. Considering thefact N <<N2< D, the computational complexity of DWSSRis smaller than O(4NDn). The computational complexity ofISSC is lower than that of MVPCA and SpaBS because ofK < k << N << D. Among all the methods, the SpaBS hasthe highest computational complexity, and ISSC and DWSSRhave smaller computational complexity.

Furthermore, we compare the computational speed of all theabove methods on a Windows 7 computer with Inter i5-4570Quad Core Processor and 8 GB of RAM. The DWSSR andother five methods are implemented in MATLAB 2014a. Forthe three datasets, the sizes of band subset k are set between10 and 50 with a step interval of 10. Other parameters in allthe six methods are the same as their counterparts in the pre-vious experiments, and all the parameter configurations arelisted in Table V. The comparison results in computationaltimes of all the six methods on the three datasets are listed inTable VIII. Among all the methods, The SpaBS and SSR takethe first longest and second longest computational times on thethree datasets. The ISSC has shorter computational times thanMVPCA and SNMF, and the computation times of MVPCAare shorter than those of SNMF. The computational speed of


TABLE VIIICOMPUTATIONAL TIMES OF SIX BAND SELECTION METHODS USING DIFFERENT CHOICES OF BAND SUBSET

Fig. 12. Relations between the regularization parameter λ and the OCAs on the three datasets. (a) Indian Pines. (b) Urban. (c) PaviaU.

DWSSR does not change clearly with the growing k, and it hasshorter computational times than ISSC when having a largersize of band subset.

4) Effect From the Regularization Parameter λ on theSensitivity of Classification Accuracy: The experiment investi-gates the effect from the regularization parameter λ in DWSSRon the OCAs and ACAs of three HSI datasets when changing λfrom smaller to larger values. In the experiment, the range of λin Indian Pines dataset is set as [0.001, 0.01, 0.1, 0.3, 0.4, 0.5,0.8, 1, 3, 5,10, 30, 50]; the range of λ in Urban dataset is setas [0.001, 0.01, 0.1, 0.3, 0.6, 0.8, 1, 2, 5, 8, 10]; and the λ inPaviaU dataset is set as [0.001, 0.01, 0.1, 0.3, 0.5, 0.8, 1.0, 1.1,1.5, 2, 5,10, 20, 50]. We did not investigate the impact of λ inclassification with a certain step interval value, because the can-didate interval is too large to make detailed analysis. The sizeof band subset k in Indian Pines dataset is 12, the k in Urbandataset is 20, and the k in PaviaU dataset is 10. Other parame-ter configurations of DWSSR in the experiment are the same astheir counterparts in previous experiments, and they are listedin Table V.

Fig. 12 shows the OCA plots of DWSSR from KNN andSVM classifiers on the three HSI datasets. For the Indian Pinesdataset, the OCA curves continually rise when the regulariza-tion parameter λ varies from 0.001. The OCA curves arrive atthe peak value when having a moderate regularization param-eter λ between 0.3 and 0.4. After that, both curves graduallydrop with the continual increase in the parameter λ. Moreover,similar observations exist in the OCA curves of both Urban and

PaviaU datasets. The OCA curves from KNN and SVM clas-sifiers on both datasets firstly rise up, then arrive at the peakvalue and finally fall down, when the regularization parame-ter λ changes from 0.001 to the end of its range. The aboveimplies that the choice of regularization parameter λ has agreat effect on classification accuracies of the DWSSR bandsubset.

5) Effect From the Regularization Parameter μ on theSensitivity of Classification Accuracy: The experiment investi-gates the effect from the regularization parameter μ in DWSSRon the OCAs and ACAs of three HSI datasets when changing μfrom smaller to larger values. Similar to experiment 4), we didnot investigate the impact of μ in classification with a certainstep interval value. In the experiment, the ranges of parameterμ in the three datasets are set as [0.001, 0.005, 0.01, 0.05, 0.1,0.5, 0.8, 1, 2, 5, 10, 20, 50, 100]. Other parameter configurationsof DWSSR in the experiment are the same as their counterpartsin previous experiments and are listed in Table V.

Fig. 13 shows the OCA curves of DWSSR from KNN andSVM classifiers on the three HSI datasets. For all datasets, theOCAs rise with the changing regularization parameter μ from0.001. With the continual increase in μ, all the OCA curves firstarrive at the peak value and then begin to fall down until theend of the parameter range. The OCA curves of Indian Pineshave the peak value when having the parameter μ between 0.05and 0.01; the Urban OCAs maximize with μ = 0.005; and thePaviaU curves crest with μ = 0.1. All the curves on the threedatasets indicate that the choice of regularization parameter


Fig. 13. Relations between the regularization parameter μ and the OCAs on the three datasets. (a) Indian Pines. (b) Urban. (c) PaviaU.

μ has a significant effect on classification performance of theDWSSR method.

C. Analysis and Discussions

The above five groups of experiments on the three HSIdatasets investigate the performance of DWSSR method. TheDWSSR is compared against five popular band selection meth-ods, MVPCA, SNMF, SpaBS, ISSC, and SSR. The contrastin three quantitative measures and classification accuraciesbetween SSR and DWSSR shows that the DWSSR band subsethas higher information amount, higher intraclass separabilities,lower intraband correlations, and better classification results.That means the dissimilarity-weighted regularization term suc-cessfully reduces the similarity and redundancy of row vectorsin the coefficient matrix, and promotes the performance ofSSR in band selection. In contrast to SSR, DWSSR is a moreappropriate method for band selection in HSI classification.

The experiments further compare the classification perfor-mance of DWSSR with those of five other methods (MVPCA,SpaBS, SNMF, ISSC, and SSR). DWSSR achieves compara-ble or better classification accuracies than ISSC while it clearlyoutperforms other four methods. SSR obtains better or compa-rable classification performance to SNMF. The MVPCA andSpaBS band subsets have the worst performance in classifica-tion. On the other hand, the computational experiments showthat DWSSR has better computational speeds than MVPCA,SNMF, SpaBS, and SSR, and it has shorter computationaltimes than ISSC when having a larger size of band subset.The low convergence speeds of ADMM in SSR bring aboutits higher computational times. In contrast, interestingly, theshorter computational times in DWSSR tell us that the dissimi-larity regularization term improves the convergence conditionsof ADMM and greatly promotes its convergence speed. Thelonger computational times of MVPCA result from the com-putation in principal component analysis (PCA) transformationof HSI data. The lowest computational speeds in SpaBS resultfrom the huge computational complexity of dictionary learningusing the K-SVD algorithm. Considering the fact that a largersize of band subset is more feasible in realistic applications, theDWSSR performs almost best among all the six methods, inboth classification accuracies and computational times.

Finally, experiments on the effects from two regularizationparameters λ and μ on the classification sensitivity of DWSSR

show that both parameters have significant impacts on theclassification performance of the DWSSR band subset. Theexperiment results illustrate that a moderate λ between 0.3and 1.1 is proper for the DWSSR in band selection, while asmaller μ between 0.005 and 0.1 guarantees good classificationaccuracies for the DWSSR band subset.

V. CONCLUSION AND FUTURE WORK

This paper proposes the DWSSR method to help select-ing a proper band subset in HSI classification. The DWSSRintegrates the regular SSR model with a new dissimilarity-weighted regularization term and estimates the representativebands using nonzero row vectors in the sparse coefficient matrixestimated from the method. The DWSSR selects a band sub-set that best represents the majority of all the bands. Thedissimilarity-weighted regularization term enhances the spar-sity of the coefficient matrix and helps to avoid selecting toosimilar representative bands, but it could not help to find allthe distinctive bands. Five groups of experiments are imple-mented to testify the performance of DWSSR in selectingthe representative bands, and classification and computationalresults are compared against those of MVPCA, SpaBS, SNMF,ISSC, and SSR. The results show that DWSSR performs almostbest among all the six methods, either in classification accura-cies or computational speeds. Especially, the contrast betweenDWSSR and SSR shows that the dissimilarity-weighted reg-ularization term improves the convergence speed of ADMMand reduces redundancy in row vectors of coefficient matrix,thus making that DWSSR clearly outperforms SSR. Finally,the investigation in the choice of two regularization parame-ter λ and μ tells that a moderate λ between 0.3 and 1.1 anda smaller μ between 0.001 and 0.1 lead to good classificationaccuracy of the DWSSR band subset. However, unfortunately,the authors could only provide the above reference intervalsfor both parameters and are not able to give an accurate esti-mation for each of them. In the further work, we will studyintelligent schemes in parameter estimation to obtain accurateregularization parameters λ and μ. Moreover, we will furthercompare the performance of CDM in DWSSR with that ofmore dissimilarity measurements and present new measure-ments to improve the performance of DWSSR. In addition,we will test our DWSSR method against more HSI datasets tofurther promote its realistic world applications.


ACKNOWLEDGMENT

The authors would like to thank the editor and referees fortheir suggestions to improve this paper.

REFERENCES

[1] J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders,N. M. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing dataanalysis and future challenges,” IEEE Geosci. Remote Sens. Mag., vol. 1,no. 2, pp. 6–36, Jun. 2013.

[2] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, andJ. C. Tilton, “Advances in spectral-spatial classification of hyperspectralimages,” Proc. IEEE, vol. 101, no. 3, pp. 652–675, Mar. 2013.

[3] A. Plaza et al., “Recent advances in techniques for hyperspectral imageprocessing,” Remote Sens. Environ., vol. 113, pp. S110–S122, 2009.

[4] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. Atli Benediktsson,“Advances in hyperspectral image classification: Earth monitoring withstatistical learning methods,” IEEE Signal Process. Mag., vol. 31, no. 1,pp. 45–54, Jan. 2014.

[5] C. Giardino, M. Bresciani, E. Valentini, L. Gasperini, R. Bolpagni, andV. E. Brando, “Airborne hyperspectral data to assess suspended particu-late matter and aquatic vegetation in a shallow and turbid lake,” RemoteSens. Environ., vol. 157, pp. 48–57, 2015.

[6] J. Cui et al., “Temperature and emissivity separation and mineral mappingbased on airborne TASI hyperspectral thermal infrared data,” Int. J. Appl.Earth Observ. Geoinf., vol. 40, pp. 19–28, 2015.

[7] R. J. Murphy, S. S. Schneider, and S. T. Monteiro, “Mapping layersof clay in a vertical geological surface using hyperspectral imagery:Variability in parameters of SWIR absorption features under differentconditions of illumination,” Remote Sens., vol. 6, pp. 9104–9129, 2014.

[8] C. Cilia et al., “Nitrogen status assessment for variable rate fertilizationin maize through hyperspectral imagery,” Remote Sens., vol. 6, pp. 6549–6565, 2014.

[9] K. C. Cierniewski et al., “Effects of different illumination and observationtechniques of cultivated soils on their hyperspectral bidirectional mea-surements under field and laboratory conditions,” IEEE J. Sel. TopicsAppl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2525–2530, Jun.2014.

[10] L. Luft, C. Neumann, M. Freude, N. Blaum, and F. Jeltsch,“Hyperspectral modeling of ecological indicators-A new approach formonitoring former military training areas,” Ecol. Indic., vol. 46, pp. 264–285, 2014.

[11] L. Zhang, L. Zhang, D. Tao, and X. Huang, “Tensor discriminative local-ity alignment for hyperspectral image spectral-spatial feature extraction,”IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1, pp. 242–256, Jan. 2013.

[12] L. Zhang, L. Zhang, D. Tao, and X. Huang, “On combining multiple fea-tures for hyperspectral remote sensing image classification,” IEEE Trans.Geosci. Remote Sens., vol. 50, no. 3, pp. 879–893, Mar. 2012.

[13] B. Du and L. Zhang, “Target detection based on a dynamic subspace,”Pattern Recognit., vol. 47, pp. 344–358, 2014.

[14] W. Sun et al., “UL-Isomap based nonlinear dimensionality reductionfor hyperspectral imagery classification,” ISPRS J. Photogramm. RemoteSens., vol. 89, pp. 25–36, 2014.

[15] L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, and B. Du, “Ensemblemanifold regularized sparse low-rank approximation for multiview fea-ture embedding,” Pattern Recognit., vol. 48, pp. 3102–3112.

[16] P. Bajcsy and P. Groves, “Methodology for hyperspectral band selection,”Photogramm. Eng. Remote Sens., vol. 70, pp. 793–802, 2004.

[17] E. Arzuaga-Cruz, L. O. Jimenez-Rodriguez, and M. Velez-Reyes,“Unsupervised feature extraction and band subset selection techniquesbased on relative entropy criteria for hyperspectral data analysis,”in Proc. SPIE Conf. Algorithms Technol. Multispectral HyperspectralUltraspectral Imagery IX, Orlando, FL, USA, Apr. 21, 2003,pp. 462–473.

[18] B. Guo, S. R. Gunn, R. Damper, and J. Nelson, “Band selection for hyper-spectral image classification using mutual information,” IEEE Geosci.Remote Sens. Lett., vol. 3, no. 4, pp. 522–526, Oct. 2006.

[19] C.-I. Chang, Q. Du, T.-L. Sun, and M. L. Althouse, “A joint bandprioritization and band-decorrelation approach to band selection forhyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,vol. 37, no. 6, pp. 2631–2641, Nov. 1999.

[20] C.-I. Chang and S. Wang, “Constrained band selection for hyperspectralimagery,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 6, pp. 1575–1585, Jun. 2006.

[21] J. Feng, L. Jiao, F. Liu, T. Sun, and X. Zhang, “Mutual-information-basedsemi-supervised hyperspectral band selection with high discrimination,high information, and low redundancy,” IEEE Trans. Geosci. RemoteSens., vol. 53, no. 5, pp. 2956–2969, May 2015.

[22] A. Martínez-Usó, F. Pla, J. M. Sotoca, and P. García-Sevilla, “Clustering-based hyperspectral band selection using information measures,” IEEETrans. Geosci. Remote Sens., vol. 45, no. 12, pp. 4158–4171, Dec.2007.

[23] S. Jia, G. Tang, J. Zhu, and Q. Li, “A novel ranking-based clusteringapproach for hyperspectral band selection,” IEEE Trans. Geosci. RemoteSens., vol. 54, no. 1, pp. 88–102, Jan. 2015.

[24] C. Wang, M. Gong, M. Zhang, and Y. Chan, “Unsupervised hyperspectralimage band selection via column subset selection,” IEEE Geosci. RemoteSens. Lett., vol. 12, no. 7, pp. 1411–1415, Jul. 2015.

[25] P. Mausel, W. Kramber, and J. Lee, “Optimum band selection for super-vised classification of multispectral data,” Photogramm. Eng. RemoteSens., vol. 56, pp. 55–60, 1990.

[26] N. Keshava, “Distance metrics and band selection in hyperspectral pro-cessing with applications to material identification and spectral libraries,”IEEE Trans. Geosci. Remote Sens., vol. 42, no. 7, pp. 1552–1565, Jul.2004.

[27] H. Yang, Q. Du, and G. Chen, “Particle swarm optimization-based hyper-spectral dimensionality reduction for urban land cover classification,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 2,pp. 544–554, Apr. 2012.

[28] W. Xia, B. Wang, and L. Zhang, “Band selection for hyperspectralimagery: A new approach based on complex networks,” IEEE Geosci.Remote Sens. Lett., vol. 10, no. 5, pp. 1229–1233, Sep. 2013.

[29] M. Gong, M. Zhang, and Y. Yuan, “Unsupervised band selection basedon evolutionary multi-objective optimization for hyperspectral images,”IEEE Trans. Geosci. Remote Sens., vol. 54, no. 1, pp. 554–557, Jan. 2015.

[30] C.-I. Chang and K.-H. Liu, “Progressive band selection of spectralunmixing for hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens.,vol. 52, no. 4, pp. 2002–2017, Apr. 2014.

[31] G. Zhu, Y. Huang, J. Lei, Z. Bi, and F. Xu, “Unsupervised hyperspectralband selection by dominant set extraction,” IEEE Trans. Geosci. RemoteSens., vol. 54, no. 1, pp. 227–239, Jan. 2015.

[32] H. Su, Q. Du, and P. Du, “Hyperspectral image visualization using bandselection,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7,no. 6, pp. 2647–2658, Jun. 2014.

[33] M. A. Davenport and M. F. Duarte, “Introduction to compressed sensing,”Elect. Eng., vol. 93, no. 1, pp. 1–68, 2011.

[34] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classifi-cation using dictionary-based sparse representation,” IEEE Trans. Geosci.Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.

[35] A. S. Charles, B. Olshausen, and C. J. Rozell, “Learning sparse codes forhyperspectral imagery,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 5,pp. 963–978, Sep. 2011.

[36] S. Li and H. Qi, “Sparse representation based band selection for hyper-spectral images,” in Proc. 18th IEEE Int. Conf. Image Process. (ICIP’11),Brussels, Belgium, Sep. 11–14, 2011, pp. 2693–2696.

[37] W. Sun, W. Li, J. Li, and Y. M. Lai, “Band selection using sparse nonneg-ative matrix factorization with the thresholded earth’s mover distance forhyperspectral imagery classification,” Earth Sci. Informat., vol. 8, no. 4,pp. 907–918, 2015.

[38] J.-M. Li and Y.-T. Qian, “Clustering-based hyperspectral band selectionusing sparse nonnegative matrix factorization,” J. Zhejiang Univ. Sci. C,vol. 12, pp. 542–549, 2011.

[39] Q. Du, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral band selec-tion using a collaborative sparse model,” in Proc. IEEE Int. Geosci.Remote Sens. Symp. (IGARSS’12), Munich, Germany, Jul. 22–27, 2012,pp. 3054–3057.

[40] S. Chepushtanova, C. Gittins, and M. Kirby, “Band selection in hyper-spectral imagery using sparse support vector machines,” in Proc. SPIEConf. Algorithms Technol. Multispectral Hyperspectral UltraspectralImagery, Baltimore, MD, USA, May 5–7, 2014, pp. 90881F-1–90881F-15.

[41] W. Sun, L. Zhang, B. Du, W. Li, and Y. M. Lai, “Band selection usingimproved sparse subspace clustering for hyperspectral imagery classifi-cation,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8,no. 6, pp. 2784–2797, Jun. 2015.

[42] X. Geng, K. Sun, and L. Ji, “Band selection for target detection inhyperspectral imagery using sparse CEM,” Remote Sens. Lett., vol. 5,pp. 1022–1031, 2014.

[43] K. Sun, X. Geng, and L. Ji, “A new sparsity-based band selection methodfor target detection of hyperspectral image,” IEEE Geosci. Remote Sens.Lett., vol. 12, no. 2, pp. 329–333, Feb. 2015.


[44] Q. Zhang, Y. Tian, Y. Yang, and C. Pan, “Automatic spatial-spectralfeature selection for hyperspectral image via discriminative sparse mul-timodal learning,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 1,pp. 261–279, Jan. 2015.

[45] E. Elhamifar, G. Sapiro, and R. Vidal, “See all by looking at a few:Sparse modeling for finding representative objects,” in Proc. IEEE Conf.Comput. Vis. Pattern Recognit. (CVPR’12), Providence, RI, USA, Jun.16–21, 2012, pp. 1600–1607.

[46] K. Koh, S.-J. Kim, and S. P. Boyd, “An interior-point method for large-scale L1-regularized logistic regression,” J. Mach. Learn. Res., vol. 8,pp. 1519–1555, 2007.

[47] E. Van Den Berg and M. P. Friedlander, “Theoretical and empirical resultsfor recovery from multiple measurements,” IEEE Trans. Inf. Theory,vol. 56, no. 5, pp. 2516–2527, May 2010.

[48] J.-M. Feng and C.-H. Lee, “Generalized subspace pursuit for signalrecovery from multiple-measurement vectors,” in Proc. IEEE WirelessCommun. Netw. Conf. (WCNC’13), Shanghai, China, Apr. 7–10, 2013,pp. 2874–2878.

[49] H. K. Aggarwal and A. Majumdar, “Extension of sparse randomizedkaczmarz algorithm for multiple measurement vectors,” in Proc. IEEE22nd Int. Conf. Pattern Recognit. (ICPR’14), Stockholm, Sweden, Aug.24–28, 2014, pp. 1014–1019.

[50] C.-I. Chang, “Spectral information divergence for hyperspectral imageanalysis,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS’99),Hamburg, Germany, Jun. 28/Jul. 2, 1999, pp. 509–511.

[51] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed opti-mization and statistical learning via the alternating direction method ofmultipliers,” Found. Trends Mach. Learn., vol. 3, pp. 1–122, 2011.

[52] E. Elhamifar and R. Vidal, “Sparse subspace clustering: Algorithm, the-ory, and applications,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35,no. 11, pp. 2765–2781, Nov. 2013.

[53] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.:Cambridge Univ. Press, 2004.

[54] B. Du, Z. Wang, L. Zhang, L. Zhang, and D. Tao, “Exploring represen-tativeness and informativeness for active learning,” IEEE Trans. Cybern.,Nov. 2015, doi: 10.1109/TCYB.2015.2496974.

[55] B. Du and L. Zhang, “A discriminative metric learning based anomalydetection method,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 11,pp. 6844–6857, Nov. 2014.

[56] I. Steinwart and A. Christmann, Support Vector Machines. Berlin,Germany: Springer-Verlag, 2008.

[57] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEETrans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967.

Weiwei Sun received the B.S. degree in survey-ing and mapping and the Ph.D. degree in cartog-raphy and geographic information engineering fromTongji University, Shanghai, China, in 2007 and2013, respectively.

From 2011 to 2012, he studied at the Departmentof Applied Mathematics, University of Maryland,College Park, College Park, MD, USA, where heis working as a Visiting Scholar to study on thedimensionality reduction of hyperspectral image. Heis currently as an Associate Professor with Ningbo

University, Ningbo, China, and is working as a Postdoc with the State KeyLaboratory for Information Engineering in Surveying, Mapping, and RemoteSensing (LIESMARS), Wuhan University, Wuhan, China. He has authoredmore than 20 journal papers. His research interests include hyperspectral imageprocessing with manifold learning, anomaly detection and target recognition ofremote sensing imagery using compressive sensing.

Liangpei Zhang (M’06–SM’08) received the B.S.degree in physics from Hunan Normal University,Changsha, China, in 1982, the M.S. degree inoptics from Xi’an Institute of Optics and PrecisionMechanics, Chinese Academy of Sciences, Xi’an,China, in 1988, and the Ph.D. degree in photogram-metry and remote sensing from Wuhan University,Wuhan, China, in 1998.

He is currently the Head of the Remote SensingDivision, State Key Laboratory of InformationEngineering in Surveying, Mapping, and Remote

Sensing, Wuhan University. He is also a “Chang-Jiang Scholar” ChairProfessor appointed by the Ministry of Education of China. He is currently aPrincipal Scientist for the China State Key Basic Research Project (2011–2016)appointed by the Ministry of National Science and Technology of China to leadthe remote sensing program in China. He has authored more than 310 researchpapers. He is the holder of five patents. His research interests include hyper-spectral remote sensing, high-resolution remote sensing, image processing, andartificial intelligence.

Dr. Zhang is an Executive Member (Board of Governor) of the ChinaNational Committee of International Geosphere–Biosphere Programme, theChina Society of Image and Graphics, etc. He regularly serves as a Co-Chair of the series SPIE Conferences on Multispectral Image Processing andPattern Recognition, Conference on Asia Remote Sensing, and many otherconferences. He was the Editor of several conference proceedings, issues,and geoinformatics symposiums. He also serves as an Associate Editor of theIEEE Transactions on Geoscience and Remote Sensing, International Journalof Ambient Computing and Intelligence, International Journal of Image andGraphics, International Journal of Digital Multimedia Broadcasting, Journal ofGeo-spatial Information Science, and Journal of Remote Sensing.

Lefei Zhang (S’11–M’14) received the B.S. andPh.D. degrees in photogrammetry and remote sens-ing from Wuhan University, Wuhan, China, in 2008and 2013, respectively.

From August 2013 to July 2015, he was withthe School of Computer, Wuhan University, as aPostdoctoral Researcher, and was a Visiting Scholarat CAD & CG Laboratory, Zhejiang University,Hangzhou, China, in 2015. He is currently a Lecturerwith the School of Computer, Wuhan University, andalso a Hong Kong Scholar with the Department of

Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong.His research interests include pattern recognition, image processing, and remotesensing.

Dr. Zhang is a Reviewer of more than 20 international journals, including theIEEE TIP, TNNLS, TMM, and TGRS.

Yenming Mark Lai received the B.S. degree in math-ematics from Rice University, Houston, TX, USA. Heis currently pursuing the Ph.D. degree in mathematicsat the University of Maryland, College Park, CollegePark, MD, USA.

His research interests include manifold learning ofhyperspectral imagery and compressive sensing.

4374 ieee journal of selected topics in …...band selection problem. the sparsity theory states...

Documents