[ieee 2012 7th iapr workshop on pattern recognition in remote sensing (prrs) - tsukuba science city,...

4
Hierarchical Band Clustering for Hyperspectral Image Analysis Honun Su l , 2 , Peijun Du 2 , Qian Du 3 I School of Earth Sciences and Engineering, Hohai University, China 2 Department of Geographical Information Science, Nanjing University, China 3Department of Electrical and Computer Engineering, Mississippi State University, USA Email: [email protected] Abstract Band clustering is applied to dimensionali reduction of hyperspectral imagery. The proposed method is based on a hierarchical clustering structure, which aims to group bands using an information or similari measure. Specically, the distance based on orthogonal projection divergence (OPD) is used as a criterion for clustering. Moreover, dferent from unsupervised clustering using all the pels or supervised clustering requiring labeled pixels, the proposed semi-supervised band clustering needs class spectral signatures only. The experimental results show that the proposed algorithm can signicantly outperform other existing methods with regard to pixel-based classication task. 1. Introduction Hyperspectral remote sensing technique provides fme and detailed information by contiguous and narrow spectral channels. Its abundant spectral information provides the potential of accurate object identification. However, its vast data volume brings about problems in data transmission and storage. As a commonly used technique for hyperspectral imagery analysis, dimensionality reduction can bridge these problems. Dimensionality reduction can be achieved by band selection, whose objective is to find a small subset of bands containing important data information. It can also be achieved by a transformation method, where the original high-dimensional data is projected onto a low-dimensional space with a certain criterion. For instance, the objective of principal component analysis (PCA) is to maximize the variance of the transformed data, whereas that of Fisher's linear discriminant analysis (LDA) is to maximize the class separability given that training samples are known. Dimensionality reduction can be divided into two categories: supervised and unsupervised. Supervised methods are to preserve the desired object information, which is known a priori, whereas unsupervised methods do not assume any object information. For example, linear prediction and semi-supervised band clustering were employed for band selection in [1-2]; distance and information metrics (e.g., divergence, transformed divergence, mutual information) and eigenanalysis (e.g., PCA) have been applied for band selection [3-5]. Although these supervised techniques clearly aim at selecting bands that include important object information and the selected bands can provide better detection or classification than those from unsupervised techniques, the required prior knowledge may be unavailable in practice. Unsupervised approaches which do not need training samples are also studied (e.g., band grouping (BG) or band clustering). For instance, adjacent bands can be grouped together and a representative of each group can be selected to participate in the following data analysis. Intuitively, adjacent bands can be partitioned uniformly (denoted as BG(U)) or based on spectral correlation coefficient (denoted as BG(CC)) [2]. Clustering algorithms have been applied to hyperspectral remote sensing data analysis. The typical implementation is to cluster pixels based on the spectral signatures so as to spatially segment an image scene into many sub-regions. Another type of implementation of a clustering algorithm is in the spectral domain; in other words, bands are clustered into several groups based on their similarity. In [6], two clustering methods, Ward's linkage strategy using mutual information (WaLuMI) and Ward's linkage strategy using divergence (WaLuDi) were developed. Although it may be difficult to obtain enough training samples for each class in practice; instead, it may be possible to have a spectral signature for each

Upload: qian

Post on 14-Mar-2017

216 views

Category:

Documents


2 download

TRANSCRIPT

Hierarchical Band Clustering for Hyperspectral Image Analysis

Hongjun Sul,2, Peijun Du2, Qian Du3

I School of Earth Sciences and Engineering, Hohai University, China 2Department of Geographical Information Science, Nanjing University, China

3Department of Electrical and Computer Engineering, Mississippi State University, USA Email: [email protected]

Abstract

Band clustering is applied to dimensionality

reduction of hyperspectral imagery. The proposed

method is based on a hierarchical clustering structure,

which aims to group bands using an information or

similarity measure. Specifically, the distance based on

orthogonal projection divergence (OPD) is used as a

criterion for clustering. Moreover, different from

unsupervised clustering using all the pixels or

supervised clustering requiring labeled pixels, the

proposed semi-supervised band clustering needs class

spectral signatures only. The experimental results

show that the proposed algorithm can significantly

outperform other existing methods with regard to

pixel-based classification task.

1. Introduction

Hyperspectral remote sensing technique provides fme and detailed information by contiguous and narrow spectral channels. Its abundant spectral information provides the potential of accurate object identification. However, its vast data volume brings about problems in data transmission and storage. As a commonly used technique for hyperspectral imagery analysis, dimensionality reduction can bridge these problems. Dimensionality reduction can be achieved by band selection, whose objective is to find a small subset of bands containing important data information. It can also be achieved by a transformation method, where the original high-dimensional data is projected onto a low-dimensional space with a certain criterion. For instance, the objective of principal component analysis (PCA) is to maximize the variance of the transformed data, whereas that of Fisher's linear discriminant analysis (LDA) is to maximize the class separability given that training samples are known.

Dimensionality reduction can be divided into two categories: supervised and unsupervised. Supervised methods are to preserve the desired object information, which is known a priori, whereas unsupervised methods do not assume any object information. For example, linear prediction and semi-supervised band clustering were employed for band selection in [1-2]; distance and information metrics (e.g., divergence, transformed divergence, mutual information) and eigenanalysis (e.g., PCA) have been applied for band selection [3-5]. Although these supervised techniques clearly aim at selecting bands that include important object information and the selected bands can provide better detection or classification than those from unsupervised techniques, the required prior knowledge may be unavailable in practice.

Unsupervised approaches which do not need training samples are also studied (e.g., band grouping (BG) or band clustering). For instance, adjacent bands can be grouped together and a representative of each group can be selected to participate in the following data analysis. Intuitively, adjacent bands can be partitioned uniformly (denoted as BG(U)) or based on spectral correlation coefficient (denoted as BG(CC)) [2]. Clustering algorithms have been applied to hyperspectral remote sensing data analysis. The typical implementation is to cluster pixels based on the spectral signatures so as to spatially segment an image scene into many sub-regions. Another type of implementation of a clustering algorithm is in the spectral domain; in other words, bands are clustered into several groups based on their similarity. In [6], two clustering methods, Ward's linkage strategy using mutual information (WaLuMI) and Ward's linkage strategy using divergence (WaLuDi) were developed.

Although it may be difficult to obtain enough training samples for each class in practice; instead, it may be possible to have a spectral signature for each

class. For instance, bands are clustered using spectral infonnation divergence (SID) of target signatures in [7] for target detection purpose. In this paper, we investigate semi-supervised clustering that uses class signatures only for classification task. Here, a class signature is the representative spectrum of a class.

2. Proposed Method 2.1. Hierarchical-based Band Clustering

Hierarchical clustering was applied for band clustering with no labeled information in [6] for hyperspectral imagery; it is perfonned in a similarity space that is defined among bands. However, the similarity space can be defmed using class signatures among bands; in [7], SID was used to measure the similarity space using hyperspectral image data and spectral signatures of the targets. In this paper, we employ the orthogonal projection divergence (OPD) as the metric. Therefore, the band clustering technique proposed here is based on a hierarchical clustering process that is perfonned with a similarity metric which is defmed using class signatures.

Hierarchical clustering can group data over a variety of scales by creating a cluster tree or dendrogram, in which different linkage strategies create different tree structures. In this paper, an agglomerative strategy (Ward's linkage method) is adopted for hierarchical clustering, thus, the number of groups is reduced one by one. Hierarchical clustering-based dimensionality reduction can be detailed as follows. 1) With the known class signatures, conduct similarity

measurement between each pair of bands. 2) Group the bands into binary clusters by Ward's

linkage method, the newly formed clusters are grouped into larger clusters until a hierarchical tree is fonned.

3) After creating the hierarchical tree of binary clusters, the tree is further pruned to partition data into a specified or arbitrary number of clusters.

For Ward's linkage method, suppose that clusters Cr and Cs are merged, the distance between the new cluster CnelV=(Cr, Cs) and any other cluster (Ck) can be defmed as [7]:

D (C k' C new) = a . D ( C k' C r ) + f3 . D ( C k' C J +y·D(Cnew)+o·1 D(Ck'C,)-D(Ck'CJI (1)

where a, [J, y, and 6 are the merging coefficients. Ward's intercluster distance results from the following coefficients:

y nr +ns +nk '

-n __ -,-k_ o=¢ nr +ns + nk '

(2)

where ni is the number of instances in group i. Ward's linkage has the property of producing minimum variance partitions, it can form groups without lose important information.

2.2. Similarity Measure

The main objective of dimensionality reduction is a significant reduction of the redundant information by keeping a high accuracy in classification tasks. For this purpose, we can find information measures that quantify how much a given random variable can predict another one. It should be noted that several distance metrics can be adopted for hierarchical clustering, including Euclidean distance (L2), transformed divergence, Jeffreys-Matusita distance, spectral cosine (spectral angle). In this paper, we used OPD [8] as the distance criterion (the OPD-based hierarchical clustering is named as HCOPD). Let c, and c; denote the i-th andj-th band, respectively. Their OPD value is defined as

(3)

where Pc� = I -Ck (CrCk t cr for k = i, j, and I is an

identity matrix. p; is the orthogonal subspace of c;, ;

and c;P;c , is the squared norm of the projection of c, J

onto Pc� . Similarly, c�Pc�c; is the squared norm of

the projection of Cj onto Pc� . Two bands with a small

OPD value are relatively similar, and should be grouped together.

2.3. Computational Complexity and Methods

for Comparison

The major advantage of the proposed HCOPD method is its low computational cost. Table I lists the computational complexity of different methods during the band clustering process. For the proposed HCOPD, it is about O(LS+L2S)+O(L\ compared to O(LN+L2G)+O(L3) in the WaLuDi, where L is the number of bands, S is the number of classes, N is the number of pixels, and G is the number of gray levels. Obviously, S « N, so the complexity of HCOPD is also much lower than those of WaLuMI and WaLuDi since it uses class signatures only. As for SIDSA, its complexity is the same as that of HCOPD.

Table I Computational complexity of band clustering methods

Methods Number of Multiplications HCOPD O(LS+L2S)+O(L3) SIDSA O(LS+L2S)+O(L3)

WaLuMI O(L2N)+ O(L3) WaLuDi O(LN+L2G)+O(L3)

* L - the number of bands, G - the number of gray levels, S - the number of classes, and N - the number of pixels.

For comparison purpose, the WaLuMI, WaLuDi and SIDSA were implemented. In addition, the uniform band grouping (BG(U)) and corelation coffiecients band grouping (BG(CC)) were also included in the experiments. BG(U) is based on uniform partition is one of the subspace methods [1], and it can be considered as a simply clustering method. BG(CC) is a method to partition the bands into groups based on their correlation coefficients. In this way, the partitioned bands in each group are consecutive bands within the same spectral range.

It should be noted that, instead of using a single band selected from each cluster as in [6-7], the cluster center participates in the following data analysis which has shown better performance in [2].

3. Experiments

Two real-data experiments were conducted. Band clustering performance was evaluated in terms of classification accuracy. When training and test samples are available, support vector machine (SVM) can be applied. If only class signatures are available, then a method that does not require the training process, such as orthogonal subspace projection (OSP) can be used [9]. In the latter case, the classification maps are compared with those using all the original bands, and the similarity is assessed with spatial correlation coefficient; a larger value of avearge spatial correlation coefficient means better performance.

3.1. A VIRIS Lunar Lake Experiment

The Airborne Visible/Infrared Imaging Spectrometer (A VIRIS) data used in this experiment was taken from the Lunar Crater Volcanic Field in Northern Nye County, Nevada. After the water absorption and low-SNR bands were removed, 158 bands were left. The spatial resolution is about 20 m. According to prior information, there are six classes: cinder, playa, rhyolite, shade, vegetation, and an

anomaly class. The image scene of size 200 x 200

pixels is shown in Fig. 1.

Fig. 1 An A VIRIS Lunar Lake image scene.

0.9 - - l- - - + --

.� 0.8

� 8 0.7

is i 0.6

o

� 0.5'

t : 0.4

� � 0.3 -«

--1--- -- -- + -- --1 --

-B-HCOPD

-------;.,---WaLuMI

-----4- WaLuOi

0.2 - - '---- - - - - - J - - _I __ I I ------- -v-SIDSA

-B-BG(U) I I I I I I

--+- BG(CC) 0.1 L-----'-_--'---_�----' _ ____'___--"---_�� __ �

5 10 11 12 13 14 15 Cluster Numbers

Fig. 2 Comparison with different methods in the Lunar Lake experiment.

Because no training samples are available, OSP was used for classification, and spatial correlation coefficient with the classification maps from using all the original bands was considered as accuracy. As shown in Fig. 2, HCOPD provided the best result, which was better than SID SA using the clusters for classification. With MI and DI as criteria, WaLuMI and WaLuDi were worse than HCOPD; especially the number of cluster is large. The results of BG(U) is undulates because its uniform partition way.

3.2. Washington DC Mall Experiment

A subscene of the Washington DC Mall collected by the HYDICE sensor was used in this experiment. It has 210 bands and about 3.5 m spatial resolution. After bad band removal, only 191 bands are used. There are seven classes: roof, tree, grass, water, road, path, and shadow. Since training and test samples are known, the SVM-based classification accuracy was computed for evaluation. However, only the class sample means were used for band clustering.

As shown in Fig. 4, HCOPD provided the best result among all the band clustering algorithms. WaLuMI and WaLuDi performed quite similarly, while BG(U) yielded the worst result. Once again HCOPD outperformed BG(U), BG(CC), SIDSA, WaLuMI, and WaLUDi.

Fig. 3 The Washington DC Mall image scene.

I I 0.885 - - r - -

i 0.8 � 0.875 is ;g 0.87 -.� � 0.86 � o �oo __ L __ L __ L __ L __ L __ L __ L

I I I I I I I

0.855 -- L __ L __ 1 ___ 1 ___ 1 ___ 1 ___ 1_

�WaLuMI

-----{j-- WaLuDi

---v- SIDSA

-B-BG(U)

--+-BG(CC) 0.85 5=-----:---:-----:---:--1:':-0-�11-�12:---'---:1-=-3 -1:-:-4 �15 Cluster Numbers

Fig. 4 Comparison with different methods in the DC Mall experiment.

3.3. Computing Time

To further compare the computational complexity in addition to Table I, the computing time when the algorithms run in a personal computer with 2.26GHz CPU and 4.0GB memory were recorded and listed in Table II. We can see that HCOPD and SIDSA using class signatures can save significant amount of time, compared to WaLuMI, and WaLuDi using all the pixels. Note that the running time of these algorithms is independent of the number of clusters.

Table II Computing time of different algorithms with the number of clusters k (in seconds)

k 5 10 15

HCOPD 24.69 24.30 26.27

Lunar SIDSA 24.82 24.50 24.67

Lake WaLuMI 182.06 183.50 185.72

WaLuDi 177.77 179.90 188.87

HCOPD 78.75 74.66 75.99

DC SIDSA 76.29 77.40 75.89

Mall WaLuMI 523.21 503.08 475.30

WaLuDi 552.16 485.15 494.46

4. Conclusion

Band clustering has been investigated for hyperspectral dimensionality reduction. The proposed method uses a hierarchical clustering strategy to group

nonadjacent bands together, and its performance is better than those grouping adjacent bands only. Different from unsupervised clustering using all the pixels, the proposed semisupervised band clustering needs class spectral signatures only, thereby significantly reducing computational cost. For hierarchical clustering, OPD based on class signatures can be a good choice for similarity metric. The experimental results also showed that the proposed HCOPD method can outperform other similar methods in classification task.

Acknowledgment

This paper was partially supported by National Natural Science Foundation of China (nos. 40871195, 41171323), the Fundamental Research Funds for the Central Universities (no.2012BOI614).

References

[1] Q. Du and H. Yang, "Similarity-based unsupervised

band selection for hyperspectral image analysis," IEEE

Geosci. Remote Sens. Lett., 5(4),564-568,2008.

[2] H. Su, H. Yang, Q. Du, and Y. Sheng, "Semi­

Supervised Band Clustering for Dimensionality

Reduction of Hyperspectral Imagery," IEEE Geosci.

Remote Sens. Lett., 8(6), 1135-1139,2011.

[3] A. Ifarraguerri, "Visual method for spectral band

selection," IEEE Geosci. Remote Sens. Lett., 1(2), 101-

106,2004.

[4] C.-I Chang, Q. Du, T. Sun, and M. L. G. Althouse, "A

Jomt band prioritization and band-decorrelation

approach to band selection for hyperspectral image

classification," IEEE Trans. Geosci. Remote Sens.,

37(6),2631-2641,1999.

[5] C.-I Chang, HyperspectralImaging: Signal Processing

Algorithm Design and Analysis, John Wiley & Sons,

2009.

[6] A. Martinez-Us6, F. PIa, J. M. Sotoca, and P. Garcia­

Sevilla, "Clustering-based hyperspectral band selection

using information measures," IEEE Trans. Geosci.

Remote Sens., 45(12),4158-4171,2007.

[7] Lui Haq, X. Xu, and A. Shahzad, "Band clustering and

selection and decision fusion for target detection in

hyperspectral imagery," in Proc. ICASSP, 1101-1104,

2009.

[8] C.-I Chang, Hyperspectral Imaging: Techniques for

Spectral Detection and Classification, Kluwer

Academic/Plenum Publishers, 2003.

[9] J. C. Harsanyi and C.-I Chang, "Hyperspectral image

classification and dimensionality reduction: an

orthogonal subspace projection approach," IEEE Trans.

Geosci. Remote Sens., 32(4),779-785, 1994.