[ieee 2012 7th iapr workshop on pattern recognition in remote sensing (prrs) - tsukuba science city,...

Hyperspectral Feature Extraction Using Contourlet Transform

Zhiling Long, Qian Du, and Nicolas H. Younan Department of Electrical and Computer Engineering

Mississippi State University, MS 39762, USA {long, du,younan}@ece.msstate. edu

Abstract

In this paper, we explore hyperspectral feature

extraction using the contourlet transform (CT), a

promising multireolution analysis technique emerging

in recent years. Hyperspectral imagery is first

processed in the spectral domain with some

decorrelation techniques. Then the nonsubsampled CT

(NSCT) is applied in the spatial domain. The resulting

NSCT coefficients are used as features for

hyperspectral analysis. The spectral processing

techniques being explored include one-dimensional

discrete wavelet transform, principal component

analysis, and band selection. The extracted features

are tested in classification using support vector

machine, which yield promising results.

1. Introduction

The contourlet transform (CT) is a two-dimensional (2-D) transform technique that provides multi scale multi direction representation of images [1]. As a multiresolution analysis technique resembling the standard wavelet transform (WT), CT outperforms WT mainly in its outstanding directionality and flexible structure. It is capable of better presenting directional details typically observed in images of natural scenes and structures. Since its emergence, CT has been adopted in image processing applications with some excellent outcomes. However, application of CT to hyperspectral image analysis has not been adequately studied yet. In this paper, we explore CTbased feature extraction for hyperspectral imagery. Specifically, the CT features will be tested in classification.

Hyperspectral imagery contains both spectral and spatial information. Ideally, an effective feature set needs to combine both types of information. CT only processes images in the spatial domain. To incorporate

spectral characteristics, processing in the spectral domain is performed first. Typical techniques for spectral processing include one-dimensional (I-D) discrete wavelet transform (DWT) [2] and principal component analysis (PCA) [3]. These techniques usually aim to decorrelate the spectral channels and reduce the dimensionality at the same time. They are also capable of extracting major spectral characteristics. The extracted information is then embedded into the features through the application of CT to the spectrally processed data. In this work, we also examine CT in combination with band selection (BS) [4]. Although BS does not involve any spectral transform, it is widely adopted for hyperspectral dimensionality reduction. Figure 1 illustrates our approach for CT-based feature extraction.

L:::::7 Spectral � Spa.tial � L:::::7 Domam � Domam �

D � D �,-------, Spectral f..---\<oINSCT f..--_I Hyperspeetru

Proce55IDg Analysis Spectra.liy NSCT L-__ --.l Proce55ed Subbands '-------'

Bands Fig . 1. CT -based feature extraction .

2. Decorrelation/Dimensionality Reduction in the Spectral Domain

2.1. I-D wavelet transform

The wavelet transform is the standard multiresolution analysis technique. For spectral domain processing, I-D discrete wavelet transform (DWT) is adopted [2]. With I-D DWT, a spectral signal can be decomposed into multiple scale levels. At each scale, a high pass filter is applied to generate a detail signal (downsampled by 2). Similarly, a low pass filter is applied to obtain a downsampled approximation signal. The approximation at scale n is

output into scale n+ 1 as the signal being decomposed at that coarser scale. Figure 2 gives an example 2-level decomposition using DWT.

Level 2 ApplDX.

Level 2 Detail

Fig . 2. 1-0 OWT with a 2-level decomposition .

2.2. Principal component analysis

PCA transforms a given data set into a new space formed by a set of uncorrelated principal components (PCs) [3]. The PCs are ranked in terms of data variance, with the first PC corresponds to the greatest variance. Consider a data set {xm n=I,2, . . . ,N}, where Xn is an Lx 1 vector. Denote the sample mean as m and the covariance matrix as I:. To perform PCA, an

eigenanalysis is applied to I:. Then the eigenvalues are sorted in descending order, and the eigenvectors are rearranged according to the sorted eigenvalues. The eigenvectors, after the rearrangement, are the PCs, and the eigenvalues are the associated variances. Let V=[v\,v2, . . . ,vd the set of eigenvectors with vector size

and A=diag{Ab A2'.'" Ad be the set of eigenvalues, then the original data set is transformed as follows,

- A-1I2VT( ) XnPCA - Xn -m .

2.3. Band selection

(1)

In this work, we employ the similarity-based unsupervised BS scheme in [4]. The selection of the most distinctive and informative bands is accomplished by identifying the most dissimilar bands. The selection starts from an initial set <D, where <D = {B \, B2}. B \ and B2 are the most dissimilar pair among all the bands. Suppose B' is the linear prediction (LP) estimate of a band B using B\ and B2, then

(2)

where

(3)

Here X is an Nx3 matrix, of which the first column is all 1 's, the second column consists of all pixels in B b and the third column all pixels in B2• Y is an Nx 1 vector formed with all pixels from B. The LP error is then determined by

Error = liB - B'II (4)

Searching through all the remaining bands, a third band B3 is identified with the maximum LP error, and is joined to the initial pair to form an updated set <D = {B\, B2, B3}' This procedure is repeated for the remaining bands until the total number of selected bands meets the requirement. To determine the initial set of band pair, the procedure can be found in [4].

3. Contourlet Transform-based Feature Extraction in the Spatial Domain

CT is also known as the pyramidal directional ftlter bank [1]. It consists of two ftlter banks. First of all, a ftlter bank, known as the Laplacian pyramid, is utilized to generate a multiscale representation of an input image. Subsequently, subband images from the multiscale decomposition are processed by a directional filter bank to reveal directional details at each specific scale level. The output values from the directional filter bank are the contourlet coefficients. CT subsamples each subband before applying the directional filter bank, while a variant of CT, the nonsumbsampled contourlet transform (NSCT), does not include this subsampling [5], as shown in Figure 3. In this work, NSCT (instead of regular CT) is used for feature extraction.

CT provides a multiscale directional representation of images. Its directional ftlter bank is easily adjustable to have any number (2n) of directions for detecting fine details in nearly any orientation. Also, its basis functions have elongated supports rather than square supports as with 2-D wavelets, which make it more efficient in describing curvature details along smooth contours. Further, the decoupling of the multiscale decomposition and the directional decomposition guarantees a flexible structure, because the number of directional subbands can vary at different scale levels. This flexibility is a major difference between CT and other multi scale directional techniques.

Combining NSCT with the three respective spectral processing techniques, i.e., I-D DWT, PCA, and BS, we obtain three feature extraction schemes, denoted as WTzCTxy, PCAzCTxy, and BSzCTxy, respectively. For WTzCTxy, after DWT along the spectral axis (zaxis), NSCT is performed (in the x-y plane) for each selected spectral dimension, yielding a group of subbands of the same spatial resolution. The NSCT coefficients from all NSCT subbands are then used to form the feature. Extraction of PCAzCTxy and BSzCTxy features follow similar procedures.

4. Experiment

We test the CT-based features in hyperspectral classifications. The data used is a 220-band hyperspectral image of the Indian Pine Test Site 3, which was acquired by the A VIRIS sensor in 1992. Each spectral image is of 145x 145 pixels. The ground truth includes 16 classes. In our experiments, we use an online database [6], which has 200 spectral bands with bands 104-108, 150-163, and 220 removed as water absorption bands.

For classification experiments, we use all the labeled spectral pixels in the image, with the total being lO249 (49% of all pixels). We break them evenly into two sets of data, and perform a two-fold cross validation. We use support vector machine (SVM) for the classification, as available with the software LibSVM [7].

NSCT parameters include the number of scales (denoted here as "n-scale"), the number of directions at each scale ("n-dir"), the type of filter for the Laplacian pyramid ("LP-jilter"), and the filter for the directional filter bank ("D-ji/ter"). In general, the choice of n-scale and n-dir has more remarkable impact than that of LP-jilter or D-jilter. Thus, in this study, we only investigate the former two parameters.

For feature scheme WTzCTxy, 'db l' mother wavelet is used for a 3-level decomposition of the 200-band data. Figure 4 shows the spectral rearrangement after the decomposition. In our experiments, we select the first 25, 50, 100, and 200 dimensions (after spectral transform or band selection), respectively, to form features covering different levels of detail. For each selected dimension, we apply NSCT with various choices of n-scale and n

dir. Table 1 presents the overall classification accuracy for WTzCTxy features. In this table, an NSCT scheme of "s-n-d-m" refers to an NSCT with n

scale=n and n-dir=m at each scale. NSCT can be set for different n-dir values at different scales, which we do not further explore here.

N=25 N=25 N=50 N=lOO

Figure 4. Illustration of spectral components after 1-0 OWT (3-level).

Table 1. Classification accuracy for WTzCTxy.

NSCT Scheme Selected Spectral Dimension 25 50 100 200

s-1-d-2 92.08 92.44 91.02 89.45 s-1-d-4 89.55 89.15 86.62 83.15 s-1-d-8 85.75 84.33 80.53 75.43 s-2-d-2 95.66 96.33 96.94 97.74 s-2-d-4 94.00 95.29 95.31 95.66 s-2-d-8 92.74 93.12 92.62 91.26 s-3-d-2 97.59 98.50 98.79 99.26 s-3-d-4 97.33 98.04 98.33 98.87 s-3-d-8 97.09 97.79 98.11 98.38

According to Table 1, the performance of WTzCTxy increases significantly as n-scale increases from 1 to 3. For a fixed n-scale, the accuracy decreases as n-dir increases, with the drop remarkable for scales 1 and 2. The impact of selected spectral dimension is mixed. Typically, for a I-scale NSCT, a greater dimension is associated with reduced accuracy. On the contrary, for a 3-scale NSCT, an increased dimension enhances the classification performance. For each dimension selected, the best performance is always achieved with the 3-scale 2-direction NSCT. The overall best (99.26%) occurs with the dimension being 200, which means all data are included.

Tables 2 and 3 provide the results for feature schemes PCAzCTxy and BSzCTxy, respectively. For consistency, the experiments are also performed for the first 25, 50, lOO, and 200 dimensions, and for each type of feature. For both schemes, the observation with n-scale and n-dir is the same as the WTzCTxy scheme. However, the impact of selected dimension is different. For PCAzCTxy, the accuracy increases as the dimension increases, except for the cases of 1-scale NSCT with 200 dimensions being used. For BSzCTxy, the performance consistently improves as the dimension increases in all occasions. Again, the 3-scale 2-dimension NSCT provides the best performance for both feature schemes.

Table 4 compares the three CT-based feature extraction schemes, using the best NSCT setting. They are also compared with the counterpart designs with no CT involved (WTz, PCAz, and BSz). It is obvious that the CT-based features perform significantly better than the non-CT ones. Among the CT-features, PCAzCTxy is the best, probably due to the fact that PCA can conduct spectral decorrelation completely.

T bl 2 C I T f f PCAzCT a e assl Ica Ion accuracy or xy.



T bl 3 C I T f f BS CT a e ass I Ica Ion accuracy or z xv.



Table 4. Comparison of classification accuracy between feature schemes

Feature Scheme Selected Spectral Dimension 25 50 100 200

WfzCTxy 97.59 98.50 98.79 99.26

PCAzCTxy 98.89 99.56 99.77 99.94

BSzCTxy 97.50 97.85 98.15 98.28 Wfz 92.04 91.48 89.89 89.14

PCAz 84.67 85.33 87.97 87.95 BSz 80.61 82.86 87.17 91.71

Table 5. Comparison of classification accuracy b tw PCAzCT d PCAzWT e een xyan xy.

NSCTIWf # of Selected Dimension Scheme Subbands 25 50 100 200

CT: s-1-d-2 3 87.01 88.85 92.29 91.91

Wf: s-1 4 85.39 86.78 90.00 87.82

CT: s-2-d-2 5 94.91 96.50 98.32 99.06

Wf: s-2 7 93.30 95.25 97.47 98.32 CT: s-3-d-2 7 98.92 99.54 99.75 99.94

Wf: s-3 10 98.94 99.31 99.55 99.91

Finally, we compare CT with WT, the standard multiresolution analysis technique, in the spatialdomain feature extraction. We apply 2-D

nonsubsampled (or, undecimated) WT to PCA transformed spectral data, to form PCAzWTxy feature similar to the best PCAzCTxy scheme (s-3-d-2). For the 2-D WT, the number of detail subbands at each scale is fixed. Including the coarsest level approximation (as the CT feature does), the number of

WT subbands obtained from each decomposition is l+3xn-scale. Table 5 presents the comparison. It is observed that, when n-scale is low (l or 2),

PCAzCTxy yields better accuracies than PCAzWTxy. For the 3-scale decomposition, although PCAzCTxy performs just slightly better in accuracy, the number of subbands used is smaller. We note that for experiments in Table 5, we reduce the original data size from 145x145x200 to 144x144x200 by removing the last row and column in each channel, so that it meets the requirement of the Matlab function for 2-D

nonsubsampled WT ("swt2"). The removed row and column do not contain any labeled samples.

5. Conclusion

In this paper, we explored feature extraction for hyperspectral imagery. Combining spectral processing with NSCT in the spatial domain, we were able to obtain useful features for hyperspectral classification. Our study indicated that increasing the number of scales of the contourlet decomposition helps improve the classification accuracy. According to our exploration, NSCT works the best when preceded by PCA in the spectral-domain decorrelation. The study also demonstrated that NSCT outperforms the widely used 2-D WT in spatial feature extraction in the presented processing framework.

References

[1] M. N. Do and M. Yetterli, "The contourlet transform: an

efficient directional multiresolution image

representation," IEEE Trans. Image Proc., 14(12):2091-

2106, Dec. 2005.

[2] L. M. Bruce, C. H. Koger, and J. Li, "Dimensionality

reduction of hyperspectral data using discrete wavelet

transform feature extraction," IEEE Trans. Geo. Rem. Sens., 40(10):2331-2338, Oct. 2002.

[3] C.-I Chang and Q. Du, "Interference and noise-adjusted

principal components analysis," IEEE Trans. Geo. Rem. Sens., 37(5):2387-2396, Sep. 1999.

[4] Q. Du and H. Yang, "Similarity-based unsupervised

band selection for hyperspectral image analysis," IEEE Geo. Rem. Sens. Let., 5(4):564-568, Oct. 2008.

[5] A. L. Cunha, J. Zhou, and M. N. Do, "The

nonsubsampled contourlet transform: theory, design, and

applications," IEEE Trans. Image Proc., 15(10):3089-

3101, Oct. 2006.

[6] "Hyperspectral remote sensing scenes," database at

http://www.ehu.es/ccwintco/index. php/H yperspectral_ R emote _ Sensing_Scenes.

[7] C.-C. Chang and c.-J. Lin, "LIBSYM: a library for

support vector machines," ACM Trans. Intel. Sys. Tech., 2:27:1-27:27, 2011. Software available at

http://www.csie.ntu.edu.tw/-cjlin/libsvm.

[ieee 2012 7th iapr workshop on pattern recognition in remote sensing (prrs) - tsukuba science city,...

Documents