research article deep convolutional neural networks for hyperspectral image...
Post on 28-May-2020
4 Views
Preview:
TRANSCRIPT
Research ArticleDeep Convolutional Neural Networks forHyperspectral Image Classification
Wei Hu1 Yangyu Huang1 Li Wei1 Fan Zhang1 and Hengchao Li23
1College of Information Science and Technology Beijing University of Chemical Technology Beijing 10029 China2Sichuan Provincial Key Laboratory of Information Coding and Transmission Southwest Jiaotong University Chengdu 610031 China3Department of Aerospace Engineering Sciences University of Colorado Boulder CO 80309 USA
Correspondence should be addressed to Wei Hu thuweihuqqcom
Received 23 November 2014 Accepted 22 January 2015
Academic Editor Tianfu Wu
Copyright copy 2015 Wei Hu et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Recently convolutional neural networks have demonstrated excellent performance on various visual tasks including theclassification of common two-dimensional images In this paper deep convolutional neural networks are employed to classifyhyperspectral images directly in spectral domain More specifically the architecture of the proposed classifier contains five layerswith weights which are the input layer the convolutional layer the max pooling layer the full connection layer and the outputlayer These five layers are implemented on each spectral signature to discriminate against others Experimental results based onseveral hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance thansome traditional methods such as support vector machines and the conventional deep learning-based methods
1 Introduction
The hyperspectral imagery (HSI) [1] is acquired by remotesensers which are characterized in hundreds of observa-tion channels with high spectral resolution Taking advan-tages of the rich spectral information numerous traditionalclassification methods such as 119896-nearest-neighbors (119896-nn)minimum distance and logistic regression [2] have beendeveloped Recently some more effective feature extractionmethods as well as advanced classifiers were proposed suchas spectral-spatial classification [3] and local Fisher discrim-inant analysis [4] In the current literatures support vectormachine (SVM) [5 6] has been viewed as an efficient andstablemethod for hyperspectral classification tasks especiallyfor the small training sample sizes SVM seeks to separatetwo-class data by learning an optimal decision hyperplanewhich best separates the training samples in a kernel-included high-dimensional feature space Some extensions ofSVM in hyperspectral image classification were presented toimprove the classification performance [3 7 8]
Neural networks (NN) such as multilayer perceptron(MLP) [9] and radial basis function (RBF) [10] neural net-works have already been investigated for classification ofremote sensing data In [11] the authors proposed a semi-supervised neural network framework for large-scale HSIclassification Actually in remote sensing classification tasksSVM is superior to the traditional NN in terms of classi-fication accuracy as well as computational cost In [12] adeeper architecture of NN has been considered a powerfulmodel for classification whose classification performance iscompetitive to SVM
Deep learning-based methods achieve promising perfor-mance in many fields In deep learning the convolutionalneural networks (CNNs) [12] play a dominant role for pro-cessing visual-related problems CNNs are biologically-inspired and multilayer classes of deep learning models thatuse a single neural network trained end to end from rawimage pixel values to classifier outputs The idea of CNNswas firstly introduced in [13] improved in [14] and refinedand simplified in [15 16] With the large-scale sources
Hindawi Publishing CorporationJournal of SensorsVolume 2015 Article ID 258619 12 pageshttpdxdoiorg1011552015258619
2 Journal of Sensors
of training data and efficient implementation on GPUsCNNs have recently outperformed some other conventionalmethods even human performance [17] on many vision-related tasks including image classification [18 19] objectdetection [20] scene labeling [21] house number digitclassification [22] and face recognition [23] Besides visiontasks CNNs have been also applied to other areas such asspeech recognition [24 25] The technique has been verifiedas an effective class of models for understanding visualimage content giving some state-of-the-art results on visualimage classification and other visual-related problems In[26] the authors presented DNN for HSI classification inwhich stacked autoencoders (SAEs) were employed to extractdiscriminative features
CNNs have been demonstrated to provide even betterclassification performance than the traditional SVM classi-fiers [27] and the conventional deep neural networks (DNNs)[18] in visual-related area However since CNNs have beenonly considered on visual-related problems there are rareliteratures on the technique with multiple layers for HSIclassification In this paper we have found that CNNs canbe effectively employed to classify hyperspectral data afterbuilding appropriate layer architecture According to ourexperiments we observe that the typical CNNs such asLeNet-5 [14] with two convolutional layers are actually notapplicable for hyperspectral data Alternatively we present asimple but effective CNN architecture containing five layerswith weights for supervised HSI classification Several exper-iments demonstrate excellent performance of our proposedmethod compared to the classic SVM and the conventionaldeep learning architecture As far as we know it is the firsttime to employ the CNN with multiple layers for HSI classi-fication
The paper is organized as follows In Section 2 we givea brief introduction to CNNs In Section 3 the typicalCNN architecture and the corresponding training processare presented In Section 4 we experimentally compare theperformance of our method with SVM and some neuralnetworks with different architectures Finally we conclude bysummarizing our results in Section 5
2 CNNs
CNNs represent feed-forward neural networks which consistof various combinations of the convolutional layers maxpooling layers and fully connected layers and exploit spatiallylocal correlation by enforcing a local connectivity patternbetween neurons of adjacent layers Convolutional layersalternate with max pooling layers mimicking the nature ofcomplex and simple cells in mammalian visual cortex [28]A CNN consists of one or more pairs of convolution andmaxpooling layers and finally ends with a fully connected neuralnetworks A typical convolutional network architecture isshown in Figure 1 [24]
In ordinary deep neural networks a neuron is connectedto all neurons in the next layer CNNs are different fromordinary NN in that neurons in convolutional layer are onlysparsely connected to the neurons in the next layer basedon their relative location That is to say in a fully connected
Filter size
Pooling size
Input
Shared weights
Convolutional layer
Max pooling layer
Fully connected layer
h1 h2 h3 h4 h5 h6
W
V 1 2 3 4 5 6
Figure 1 A typical CNN architecture consisting of a convolutionallayer a max pooling layer and a fully connected layer
DNNs each hidden activation ℎ119894is computed by multiplying
the entire input V by weights W in that layer However inCNNs each hidden activation is computed by multiplyinga small local input against the weights W The weights Ware then shared across the entire input space as shown inFigure 1 Neurons that belong to the same layer share the sameweights Weight sharing is a critical principle in CNNs sinceit helps reduce the total number of trainable parameters andleads to more efficient training and more effective model Aconvolutional layer is usually followed by amax pooling layer
Due to the replication of weights in a CNN a feature maybe detected across the input data If an input image is shiftedthe neuron detecting the feature is shifted as much Pooling isused to make the features invariant from the location and itsummarizes the output of multiple neurons in convolutionallayers through a pooling function Typical pooling functionis maximum A max pooling function basically returns themaximum value from the input Max pooling partitions theinput data into a set of nonoverlapping windows and outputsthe maximum value for each subregion and reduces thecomputational complexity for upper layers and provides aform of translation invariance To be used for classificationthe computation chain of a CNN ends in a fully connectednetwork that integrates information across all locations in allthe feature maps of the layer below
Most of CNNs working in image recognition have thelower layers composed to alternate convolutional and maxpooling layers while the upper layers are fully connectedtraditional MLP NNs For example LeNet-5 is such a CNNarchitecture presented for handwritten digit recognition [14]firstly and then it is successfully used for solving other visual-related problems However LeNet-5 might not be directlyemployed for HSI classification especially for small-size datasets according to our experiments in Section 4 In this paperwe will explore what is the suitable architecture and strategyfor CNN-based HSI classification
Journal of Sensors 3
600050004000300020001000
00 50 100
(a) Asphalt
600050004000300020001000
00 50 100
(b) Meadows
600050004000300020001000
00 50 100
(c) Gravel
600050004000300020001000
00 50 100
(d) Trees
600050004000300020001000
00 50 100
(e) Sheets
600050004000300020001000
00 50 100
(f) Bare soil
600050004000300020001000
00 50 100
(g) Bitumen
600050004000300020001000
00 50 100
(h) Bricks
600050004000300020001000
00 50 100
(i) Shadows
Figure 2 Spectral signatures of the 9 classes selected from University of Pavia data set with 103 channelsspectral bands
3 CNN-Based HSI Classification
31 Applying CNNs to HSI Classification The hierarchicalarchitecture of CNNs is gradually proved to be the mostefficient and successful way to learn visual representationsThe fundamental challenge in such visual tasks is to modelthe intraclass appearance and shape variation of objects Thehyperspectral data with hundreds of spectral channels canbe illustrated as 2D curves (1D array) as shown in Figure 2(9 classes are selected from the University of Pavia data set)We can see that the curve of each class has its own visualshape which is different from other classes although it isrelatively difficult to distinguish some classes with human eye(eg gravel and self-blocking bricks) We know that CNNscan achieve competitive and even better performance thanhuman being in some visual problems and its capabilityinspires us to study the possibility of applying CNNs for HSIclassification using the spectral signatures
32 Architecture of the Proposed CNN Classifier The CNNvaries in how the convolutional and max pooling layersare realized and how the nets are trained As illustrated in
C1 feature maps M2 feature maps OutputF3
ConvolutionMax pooling Full connection Full connection
Input
Sharing same weights
1n1 times 1 20n2 times 1 20n3 times 1 n4 n5
middot middot middot
middot middot middot
Figure 3The architecture of the proposedCNNclassifierThe inputrepresents a pixel spectral vector followed by a convolution layerand a max pooling layer in turns to compute a set of 20 feature mapsclassified with a fully connected network
Figure 3 the net contains five layers with weights includingthe input layer the convolutional layer C1 the max poolinglayer M2 the full connection layer F3 and the output layerAssuming 120579 represents all the trainable parameters (weight
4 Journal of Sensors
values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579
119894is the parameter
set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D
image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899
1 1) and 119899
1is the number of bands The first hidden
convolutional layer C1 filters the 1198991times 1 input data with 20
kernels of size 1198961times1 Layer C1 contains 20times119899
2times1 nodes and
1198992= 1198991minus 1198961+ 1 There are 20 times (119896
1+ 1) trainable parameters
between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896
2 1)
Layer M2 contains 20 times 1198993times 1 nodes and 119899
3= 11989921198962
There is no parameter in this layer The fully connected layerF3 has 119899
4nodes and there are (20 times 119899
3+ 1) times 119899
4trainable
parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899
4+ 1) times 119899
5trainable parameters
between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896
1+ 1) + (20 times
1198993+ 1) times 119899
4+ (1198994+ 1) times 119899
5trainable parameters
Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899
1and
1198995are the spectral channel size and the number of output
classes of the data set respectively In our experiments 1198961is
better to be lceil11989919rceil and 119899
2= 1198991minus1198961+1 1198993can be any number
between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These
choices might not be the best but are effective for general HSIdata
In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899
3
dimensionsOur architecture has some similarities to architectures
that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data
33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible
331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899
1input units in layer
INPUT 1198995output units in layer OUTPUT and several so-
called hidden units in layers C2 M3 and F4 Assuming x119894is
the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x
119894+1as
x119894+1
= 119891119894(u119894) (1)
where
u119894= W119879119894x119894+ b119894 (2)
and W119879119894is a weight matrix of the 119894th layer acting on the
input data and b119894is an additive bias vector for the 119894th
layer 119891119894(sdot) is the activation function of the 119894th layer In
our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899
5way softmax function
which produces a distribution over the 1198995class labels and the
softmax regression model is defined as
y = 1
sum1198995
119896=1119890W119879119871119896
x119871+b119871119896
[[[[[[[[
[
119890W1198791198711
x119871+b1198711
119890W1198791198712
x119871+b1198712
119890W1198791198711198995
x119871+b1198711198995
]]]]]]]]
]
(3)
The output vector y = x119871+1
of the layer OUTPUT denotesthe final probability of all the classes in the current iteration
332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as
119869 (120579) = minus1
119898
119898
sum
119894=1
1198995
sum
119895=1
1 119895 = Y(119894) log (y(119894)119895) (4)
where 119898 is the number of training samples Y is the desiredoutput y(119894)
119895is the 119895th value of the actual output y(119894) (see (3)) of
the 119894th training sample and is a vector whose size is 1198995 In the
desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient
The derivative of the loss function with respect to u119894is
120575119894=
120597119869
120597u119894
=
minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871
(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871
(5)
where ∘ denotes element-wise multiplication 1198911015840(u119894) can be
easily represented as
1198911015840(u119894) =
(1 minus 119891 (u119894)) ∘ (1 + 119891 (u
119894)) 119894 = 1 3
null 119894 = 2
119891 (u119894) ∘ (1 minus 119891 (u
119894)) 119894 = 4
(6)
Journal of Sensors 5
⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899
1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899
119894
layerneurons = new Neuron [119899119894+1]
layerparams = 120579119894
modeladdLayer(layer)end forreturnmodel
end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899
2 1198993 1198994 1198961 1198962 according to 119899
1and 119899
5
Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899
1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)
Update 120579 using (7)err = err + mean(119869(120579))
end forerr = errBATCHEStrainingiter++
end whileSave parameters 120579 of the CNN
Algorithm 1 Our CNN-based method
Therefore on each iteration we would perform the update
120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)
for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and
nabla120579119869 (120579) =
120597119869
1205971205791
120597119869
1205971205792
120597119869
120597120579119871
(8)
We know that 120579119894containsW
119894and b
119894 and
120597119869
120597120579119894
= 120597119869
120597W119894
120597119869
120597b119894
(9)
where
120597119869
120597W119894
=120597119869
120597u119894
∘120597u119894
120597W119894
=120597119869
120597u119894
∘ x119894= 120575119894∘ x119894
120597119869
120597b119894
=120597119869
120597u119894
∘120597u119894
120597b119894
=120597119869
120597u119894
= 120575119894
(10)
With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1
34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)
4 Experiments
All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC
6 Journal of Sensors
Table 1 Number of training and test samples used in the IndianPines data set
Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065
Total 1600 6904
equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card
41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]
uniformlyThe Indian Pines data set was gathered by Airborne
VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899
1= 220 119896
1= 24 119899
2= 197 119896
2= 5 119899
3= 40 119899
4=
100 and 1198995= 8 and the number of total trainable parameters
in the data set is 81408The second data employed was also collected by the
AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899
1= 224 119896
1= 24
1198992= 201 119896
2= 5 119899
3= 40 119899
4= 100 and 119899
5= 16 and the
number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the
Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340
pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water
Table 2 Number of training and test samples used in the Salinasdata set
Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607
Total 3200 50929
Table 3 Number of training and test samples used in University ofPavia data set
Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747
Total 1800 40976
band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896
1= 11 119899
2= 93 119896
2= 3 119899
3= 30 119899
4= 100 and
1198995= 9 and the number of total trainable parameters in the
data set is 61249
42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
2 Journal of Sensors
of training data and efficient implementation on GPUsCNNs have recently outperformed some other conventionalmethods even human performance [17] on many vision-related tasks including image classification [18 19] objectdetection [20] scene labeling [21] house number digitclassification [22] and face recognition [23] Besides visiontasks CNNs have been also applied to other areas such asspeech recognition [24 25] The technique has been verifiedas an effective class of models for understanding visualimage content giving some state-of-the-art results on visualimage classification and other visual-related problems In[26] the authors presented DNN for HSI classification inwhich stacked autoencoders (SAEs) were employed to extractdiscriminative features
CNNs have been demonstrated to provide even betterclassification performance than the traditional SVM classi-fiers [27] and the conventional deep neural networks (DNNs)[18] in visual-related area However since CNNs have beenonly considered on visual-related problems there are rareliteratures on the technique with multiple layers for HSIclassification In this paper we have found that CNNs canbe effectively employed to classify hyperspectral data afterbuilding appropriate layer architecture According to ourexperiments we observe that the typical CNNs such asLeNet-5 [14] with two convolutional layers are actually notapplicable for hyperspectral data Alternatively we present asimple but effective CNN architecture containing five layerswith weights for supervised HSI classification Several exper-iments demonstrate excellent performance of our proposedmethod compared to the classic SVM and the conventionaldeep learning architecture As far as we know it is the firsttime to employ the CNN with multiple layers for HSI classi-fication
The paper is organized as follows In Section 2 we givea brief introduction to CNNs In Section 3 the typicalCNN architecture and the corresponding training processare presented In Section 4 we experimentally compare theperformance of our method with SVM and some neuralnetworks with different architectures Finally we conclude bysummarizing our results in Section 5
2 CNNs
CNNs represent feed-forward neural networks which consistof various combinations of the convolutional layers maxpooling layers and fully connected layers and exploit spatiallylocal correlation by enforcing a local connectivity patternbetween neurons of adjacent layers Convolutional layersalternate with max pooling layers mimicking the nature ofcomplex and simple cells in mammalian visual cortex [28]A CNN consists of one or more pairs of convolution andmaxpooling layers and finally ends with a fully connected neuralnetworks A typical convolutional network architecture isshown in Figure 1 [24]
In ordinary deep neural networks a neuron is connectedto all neurons in the next layer CNNs are different fromordinary NN in that neurons in convolutional layer are onlysparsely connected to the neurons in the next layer basedon their relative location That is to say in a fully connected
Filter size
Pooling size
Input
Shared weights
Convolutional layer
Max pooling layer
Fully connected layer
h1 h2 h3 h4 h5 h6
W
V 1 2 3 4 5 6
Figure 1 A typical CNN architecture consisting of a convolutionallayer a max pooling layer and a fully connected layer
DNNs each hidden activation ℎ119894is computed by multiplying
the entire input V by weights W in that layer However inCNNs each hidden activation is computed by multiplyinga small local input against the weights W The weights Ware then shared across the entire input space as shown inFigure 1 Neurons that belong to the same layer share the sameweights Weight sharing is a critical principle in CNNs sinceit helps reduce the total number of trainable parameters andleads to more efficient training and more effective model Aconvolutional layer is usually followed by amax pooling layer
Due to the replication of weights in a CNN a feature maybe detected across the input data If an input image is shiftedthe neuron detecting the feature is shifted as much Pooling isused to make the features invariant from the location and itsummarizes the output of multiple neurons in convolutionallayers through a pooling function Typical pooling functionis maximum A max pooling function basically returns themaximum value from the input Max pooling partitions theinput data into a set of nonoverlapping windows and outputsthe maximum value for each subregion and reduces thecomputational complexity for upper layers and provides aform of translation invariance To be used for classificationthe computation chain of a CNN ends in a fully connectednetwork that integrates information across all locations in allthe feature maps of the layer below
Most of CNNs working in image recognition have thelower layers composed to alternate convolutional and maxpooling layers while the upper layers are fully connectedtraditional MLP NNs For example LeNet-5 is such a CNNarchitecture presented for handwritten digit recognition [14]firstly and then it is successfully used for solving other visual-related problems However LeNet-5 might not be directlyemployed for HSI classification especially for small-size datasets according to our experiments in Section 4 In this paperwe will explore what is the suitable architecture and strategyfor CNN-based HSI classification
Journal of Sensors 3
600050004000300020001000
00 50 100
(a) Asphalt
600050004000300020001000
00 50 100
(b) Meadows
600050004000300020001000
00 50 100
(c) Gravel
600050004000300020001000
00 50 100
(d) Trees
600050004000300020001000
00 50 100
(e) Sheets
600050004000300020001000
00 50 100
(f) Bare soil
600050004000300020001000
00 50 100
(g) Bitumen
600050004000300020001000
00 50 100
(h) Bricks
600050004000300020001000
00 50 100
(i) Shadows
Figure 2 Spectral signatures of the 9 classes selected from University of Pavia data set with 103 channelsspectral bands
3 CNN-Based HSI Classification
31 Applying CNNs to HSI Classification The hierarchicalarchitecture of CNNs is gradually proved to be the mostefficient and successful way to learn visual representationsThe fundamental challenge in such visual tasks is to modelthe intraclass appearance and shape variation of objects Thehyperspectral data with hundreds of spectral channels canbe illustrated as 2D curves (1D array) as shown in Figure 2(9 classes are selected from the University of Pavia data set)We can see that the curve of each class has its own visualshape which is different from other classes although it isrelatively difficult to distinguish some classes with human eye(eg gravel and self-blocking bricks) We know that CNNscan achieve competitive and even better performance thanhuman being in some visual problems and its capabilityinspires us to study the possibility of applying CNNs for HSIclassification using the spectral signatures
32 Architecture of the Proposed CNN Classifier The CNNvaries in how the convolutional and max pooling layersare realized and how the nets are trained As illustrated in
C1 feature maps M2 feature maps OutputF3
ConvolutionMax pooling Full connection Full connection
Input
Sharing same weights
1n1 times 1 20n2 times 1 20n3 times 1 n4 n5
middot middot middot
middot middot middot
Figure 3The architecture of the proposedCNNclassifierThe inputrepresents a pixel spectral vector followed by a convolution layerand a max pooling layer in turns to compute a set of 20 feature mapsclassified with a fully connected network
Figure 3 the net contains five layers with weights includingthe input layer the convolutional layer C1 the max poolinglayer M2 the full connection layer F3 and the output layerAssuming 120579 represents all the trainable parameters (weight
4 Journal of Sensors
values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579
119894is the parameter
set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D
image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899
1 1) and 119899
1is the number of bands The first hidden
convolutional layer C1 filters the 1198991times 1 input data with 20
kernels of size 1198961times1 Layer C1 contains 20times119899
2times1 nodes and
1198992= 1198991minus 1198961+ 1 There are 20 times (119896
1+ 1) trainable parameters
between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896
2 1)
Layer M2 contains 20 times 1198993times 1 nodes and 119899
3= 11989921198962
There is no parameter in this layer The fully connected layerF3 has 119899
4nodes and there are (20 times 119899
3+ 1) times 119899
4trainable
parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899
4+ 1) times 119899
5trainable parameters
between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896
1+ 1) + (20 times
1198993+ 1) times 119899
4+ (1198994+ 1) times 119899
5trainable parameters
Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899
1and
1198995are the spectral channel size and the number of output
classes of the data set respectively In our experiments 1198961is
better to be lceil11989919rceil and 119899
2= 1198991minus1198961+1 1198993can be any number
between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These
choices might not be the best but are effective for general HSIdata
In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899
3
dimensionsOur architecture has some similarities to architectures
that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data
33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible
331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899
1input units in layer
INPUT 1198995output units in layer OUTPUT and several so-
called hidden units in layers C2 M3 and F4 Assuming x119894is
the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x
119894+1as
x119894+1
= 119891119894(u119894) (1)
where
u119894= W119879119894x119894+ b119894 (2)
and W119879119894is a weight matrix of the 119894th layer acting on the
input data and b119894is an additive bias vector for the 119894th
layer 119891119894(sdot) is the activation function of the 119894th layer In
our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899
5way softmax function
which produces a distribution over the 1198995class labels and the
softmax regression model is defined as
y = 1
sum1198995
119896=1119890W119879119871119896
x119871+b119871119896
[[[[[[[[
[
119890W1198791198711
x119871+b1198711
119890W1198791198712
x119871+b1198712
119890W1198791198711198995
x119871+b1198711198995
]]]]]]]]
]
(3)
The output vector y = x119871+1
of the layer OUTPUT denotesthe final probability of all the classes in the current iteration
332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as
119869 (120579) = minus1
119898
119898
sum
119894=1
1198995
sum
119895=1
1 119895 = Y(119894) log (y(119894)119895) (4)
where 119898 is the number of training samples Y is the desiredoutput y(119894)
119895is the 119895th value of the actual output y(119894) (see (3)) of
the 119894th training sample and is a vector whose size is 1198995 In the
desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient
The derivative of the loss function with respect to u119894is
120575119894=
120597119869
120597u119894
=
minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871
(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871
(5)
where ∘ denotes element-wise multiplication 1198911015840(u119894) can be
easily represented as
1198911015840(u119894) =
(1 minus 119891 (u119894)) ∘ (1 + 119891 (u
119894)) 119894 = 1 3
null 119894 = 2
119891 (u119894) ∘ (1 minus 119891 (u
119894)) 119894 = 4
(6)
Journal of Sensors 5
⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899
1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899
119894
layerneurons = new Neuron [119899119894+1]
layerparams = 120579119894
modeladdLayer(layer)end forreturnmodel
end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899
2 1198993 1198994 1198961 1198962 according to 119899
1and 119899
5
Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899
1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)
Update 120579 using (7)err = err + mean(119869(120579))
end forerr = errBATCHEStrainingiter++
end whileSave parameters 120579 of the CNN
Algorithm 1 Our CNN-based method
Therefore on each iteration we would perform the update
120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)
for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and
nabla120579119869 (120579) =
120597119869
1205971205791
120597119869
1205971205792
120597119869
120597120579119871
(8)
We know that 120579119894containsW
119894and b
119894 and
120597119869
120597120579119894
= 120597119869
120597W119894
120597119869
120597b119894
(9)
where
120597119869
120597W119894
=120597119869
120597u119894
∘120597u119894
120597W119894
=120597119869
120597u119894
∘ x119894= 120575119894∘ x119894
120597119869
120597b119894
=120597119869
120597u119894
∘120597u119894
120597b119894
=120597119869
120597u119894
= 120575119894
(10)
With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1
34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)
4 Experiments
All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC
6 Journal of Sensors
Table 1 Number of training and test samples used in the IndianPines data set
Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065
Total 1600 6904
equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card
41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]
uniformlyThe Indian Pines data set was gathered by Airborne
VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899
1= 220 119896
1= 24 119899
2= 197 119896
2= 5 119899
3= 40 119899
4=
100 and 1198995= 8 and the number of total trainable parameters
in the data set is 81408The second data employed was also collected by the
AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899
1= 224 119896
1= 24
1198992= 201 119896
2= 5 119899
3= 40 119899
4= 100 and 119899
5= 16 and the
number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the
Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340
pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water
Table 2 Number of training and test samples used in the Salinasdata set
Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607
Total 3200 50929
Table 3 Number of training and test samples used in University ofPavia data set
Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747
Total 1800 40976
band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896
1= 11 119899
2= 93 119896
2= 3 119899
3= 30 119899
4= 100 and
1198995= 9 and the number of total trainable parameters in the
data set is 61249
42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 3
600050004000300020001000
00 50 100
(a) Asphalt
600050004000300020001000
00 50 100
(b) Meadows
600050004000300020001000
00 50 100
(c) Gravel
600050004000300020001000
00 50 100
(d) Trees
600050004000300020001000
00 50 100
(e) Sheets
600050004000300020001000
00 50 100
(f) Bare soil
600050004000300020001000
00 50 100
(g) Bitumen
600050004000300020001000
00 50 100
(h) Bricks
600050004000300020001000
00 50 100
(i) Shadows
Figure 2 Spectral signatures of the 9 classes selected from University of Pavia data set with 103 channelsspectral bands
3 CNN-Based HSI Classification
31 Applying CNNs to HSI Classification The hierarchicalarchitecture of CNNs is gradually proved to be the mostefficient and successful way to learn visual representationsThe fundamental challenge in such visual tasks is to modelthe intraclass appearance and shape variation of objects Thehyperspectral data with hundreds of spectral channels canbe illustrated as 2D curves (1D array) as shown in Figure 2(9 classes are selected from the University of Pavia data set)We can see that the curve of each class has its own visualshape which is different from other classes although it isrelatively difficult to distinguish some classes with human eye(eg gravel and self-blocking bricks) We know that CNNscan achieve competitive and even better performance thanhuman being in some visual problems and its capabilityinspires us to study the possibility of applying CNNs for HSIclassification using the spectral signatures
32 Architecture of the Proposed CNN Classifier The CNNvaries in how the convolutional and max pooling layersare realized and how the nets are trained As illustrated in
C1 feature maps M2 feature maps OutputF3
ConvolutionMax pooling Full connection Full connection
Input
Sharing same weights
1n1 times 1 20n2 times 1 20n3 times 1 n4 n5
middot middot middot
middot middot middot
Figure 3The architecture of the proposedCNNclassifierThe inputrepresents a pixel spectral vector followed by a convolution layerand a max pooling layer in turns to compute a set of 20 feature mapsclassified with a fully connected network
Figure 3 the net contains five layers with weights includingthe input layer the convolutional layer C1 the max poolinglayer M2 the full connection layer F3 and the output layerAssuming 120579 represents all the trainable parameters (weight
4 Journal of Sensors
values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579
119894is the parameter
set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D
image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899
1 1) and 119899
1is the number of bands The first hidden
convolutional layer C1 filters the 1198991times 1 input data with 20
kernels of size 1198961times1 Layer C1 contains 20times119899
2times1 nodes and
1198992= 1198991minus 1198961+ 1 There are 20 times (119896
1+ 1) trainable parameters
between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896
2 1)
Layer M2 contains 20 times 1198993times 1 nodes and 119899
3= 11989921198962
There is no parameter in this layer The fully connected layerF3 has 119899
4nodes and there are (20 times 119899
3+ 1) times 119899
4trainable
parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899
4+ 1) times 119899
5trainable parameters
between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896
1+ 1) + (20 times
1198993+ 1) times 119899
4+ (1198994+ 1) times 119899
5trainable parameters
Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899
1and
1198995are the spectral channel size and the number of output
classes of the data set respectively In our experiments 1198961is
better to be lceil11989919rceil and 119899
2= 1198991minus1198961+1 1198993can be any number
between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These
choices might not be the best but are effective for general HSIdata
In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899
3
dimensionsOur architecture has some similarities to architectures
that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data
33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible
331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899
1input units in layer
INPUT 1198995output units in layer OUTPUT and several so-
called hidden units in layers C2 M3 and F4 Assuming x119894is
the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x
119894+1as
x119894+1
= 119891119894(u119894) (1)
where
u119894= W119879119894x119894+ b119894 (2)
and W119879119894is a weight matrix of the 119894th layer acting on the
input data and b119894is an additive bias vector for the 119894th
layer 119891119894(sdot) is the activation function of the 119894th layer In
our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899
5way softmax function
which produces a distribution over the 1198995class labels and the
softmax regression model is defined as
y = 1
sum1198995
119896=1119890W119879119871119896
x119871+b119871119896
[[[[[[[[
[
119890W1198791198711
x119871+b1198711
119890W1198791198712
x119871+b1198712
119890W1198791198711198995
x119871+b1198711198995
]]]]]]]]
]
(3)
The output vector y = x119871+1
of the layer OUTPUT denotesthe final probability of all the classes in the current iteration
332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as
119869 (120579) = minus1
119898
119898
sum
119894=1
1198995
sum
119895=1
1 119895 = Y(119894) log (y(119894)119895) (4)
where 119898 is the number of training samples Y is the desiredoutput y(119894)
119895is the 119895th value of the actual output y(119894) (see (3)) of
the 119894th training sample and is a vector whose size is 1198995 In the
desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient
The derivative of the loss function with respect to u119894is
120575119894=
120597119869
120597u119894
=
minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871
(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871
(5)
where ∘ denotes element-wise multiplication 1198911015840(u119894) can be
easily represented as
1198911015840(u119894) =
(1 minus 119891 (u119894)) ∘ (1 + 119891 (u
119894)) 119894 = 1 3
null 119894 = 2
119891 (u119894) ∘ (1 minus 119891 (u
119894)) 119894 = 4
(6)
Journal of Sensors 5
⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899
1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899
119894
layerneurons = new Neuron [119899119894+1]
layerparams = 120579119894
modeladdLayer(layer)end forreturnmodel
end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899
2 1198993 1198994 1198961 1198962 according to 119899
1and 119899
5
Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899
1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)
Update 120579 using (7)err = err + mean(119869(120579))
end forerr = errBATCHEStrainingiter++
end whileSave parameters 120579 of the CNN
Algorithm 1 Our CNN-based method
Therefore on each iteration we would perform the update
120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)
for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and
nabla120579119869 (120579) =
120597119869
1205971205791
120597119869
1205971205792
120597119869
120597120579119871
(8)
We know that 120579119894containsW
119894and b
119894 and
120597119869
120597120579119894
= 120597119869
120597W119894
120597119869
120597b119894
(9)
where
120597119869
120597W119894
=120597119869
120597u119894
∘120597u119894
120597W119894
=120597119869
120597u119894
∘ x119894= 120575119894∘ x119894
120597119869
120597b119894
=120597119869
120597u119894
∘120597u119894
120597b119894
=120597119869
120597u119894
= 120575119894
(10)
With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1
34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)
4 Experiments
All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC
6 Journal of Sensors
Table 1 Number of training and test samples used in the IndianPines data set
Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065
Total 1600 6904
equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card
41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]
uniformlyThe Indian Pines data set was gathered by Airborne
VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899
1= 220 119896
1= 24 119899
2= 197 119896
2= 5 119899
3= 40 119899
4=
100 and 1198995= 8 and the number of total trainable parameters
in the data set is 81408The second data employed was also collected by the
AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899
1= 224 119896
1= 24
1198992= 201 119896
2= 5 119899
3= 40 119899
4= 100 and 119899
5= 16 and the
number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the
Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340
pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water
Table 2 Number of training and test samples used in the Salinasdata set
Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607
Total 3200 50929
Table 3 Number of training and test samples used in University ofPavia data set
Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747
Total 1800 40976
band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896
1= 11 119899
2= 93 119896
2= 3 119899
3= 30 119899
4= 100 and
1198995= 9 and the number of total trainable parameters in the
data set is 61249
42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
4 Journal of Sensors
values) 120579 = 120579119894 and 119894 = 1 2 3 4 where 120579
119894is the parameter
set between the (119894 minus 1)th and the 119894th layerIn HSI each HSI pixel sample can be regarded as a 2D
image whose height is equal to 1 (as 1D audio inputs inspeech recognition) Therefore the size of the input layer isjust (119899
1 1) and 119899
1is the number of bands The first hidden
convolutional layer C1 filters the 1198991times 1 input data with 20
kernels of size 1198961times1 Layer C1 contains 20times119899
2times1 nodes and
1198992= 1198991minus 1198961+ 1 There are 20 times (119896
1+ 1) trainable parameters
between layer C1 and the input layer The max pooling layerM2 is the second hidden layer and the kernel size is (119896
2 1)
Layer M2 contains 20 times 1198993times 1 nodes and 119899
3= 11989921198962
There is no parameter in this layer The fully connected layerF3 has 119899
4nodes and there are (20 times 119899
3+ 1) times 119899
4trainable
parameters between this layer and layer M2The output layerhas 1198995nodes and there are (119899
4+ 1) times 119899
5trainable parameters
between this layer and layer F3Therefore the architecture ofour proposed CNN classifier totally has 20 times (119896
1+ 1) + (20 times
1198993+ 1) times 119899
4+ (1198994+ 1) times 119899
5trainable parameters
Classifying a specified HSI pixel requires the correspond-ing CNN with the aforementioned parameters where 119899
1and
1198995are the spectral channel size and the number of output
classes of the data set respectively In our experiments 1198961is
better to be lceil11989919rceil and 119899
2= 1198991minus1198961+1 1198993can be any number
between 30 and 40 and 1198962= lceil11989921198993rceil 1198994is set to be 100These
choices might not be the best but are effective for general HSIdata
In our architecture layer C1 and M2 can be viewed as atrainable feature extractor to the input HSI data and layerF3 is a trainable classifier to the feature extractor The outputof subsampling is the actual feature of the original data Inour proposed CNN structure 20 features can be extractedfrom each original hyperspectral and each feature has 119899
3
dimensionsOur architecture has some similarities to architectures
that applied CNN for frequency domain signal in speechrecognition [24 25] We think it is caused by the similaritybetween 1D input of speech spectrum and hyperspectral dataDifferent from [24 25] our network varies according to thespectral channel size and the number of output classes ofinput HSI data
33 Training Strategies Here we introduce how to learnthe parameter space of the proposed CNN classifier All thetrainable parameters in our CNN should be initialized tobe a random value between minus005 and 005 The trainingprocess contains two steps forward propagation and backpropagation The forward propagation aims to compute theactual classification result of the input data with currentparameters The back propagation is employed to updatethe trainable parameters in order to make the discrepancybetween the actual classification output and the desiredclassification output as small as possible
331 Forward Propagation Our (119871 + 1)-layer CNN network(119871 = 4 in this work) consists of 119899
1input units in layer
INPUT 1198995output units in layer OUTPUT and several so-
called hidden units in layers C2 M3 and F4 Assuming x119894is
the input of the 119894th layer and the output of the (119897 minus 1)th layerthen we can compute x
119894+1as
x119894+1
= 119891119894(u119894) (1)
where
u119894= W119879119894x119894+ b119894 (2)
and W119879119894is a weight matrix of the 119894th layer acting on the
input data and b119894is an additive bias vector for the 119894th
layer 119891119894(sdot) is the activation function of the 119894th layer In
our designed architecture we choose the hyperbolic tangentfunction tanh(u) as the activation function in layer C1 andlayer F3The maximum function max(u) is used in layer M2Since the proposed CNN classifier is a multiclass classifierthe output of layer F3 is fed to 119899
5way softmax function
which produces a distribution over the 1198995class labels and the
softmax regression model is defined as
y = 1
sum1198995
119896=1119890W119879119871119896
x119871+b119871119896
[[[[[[[[
[
119890W1198791198711
x119871+b1198711
119890W1198791198712
x119871+b1198712
119890W1198791198711198995
x119871+b1198711198995
]]]]]]]]
]
(3)
The output vector y = x119871+1
of the layer OUTPUT denotesthe final probability of all the classes in the current iteration
332 Back Propagation In the back propagation stage thetrainable parameters are updated by using the gradientdescent method It is realized by minimizing a cost functionand computing the partial derivative of the cost functionwithrespect to each trainable parameter [29] The loss functionused in this work is defined as
119869 (120579) = minus1
119898
119898
sum
119894=1
1198995
sum
119895=1
1 119895 = Y(119894) log (y(119894)119895) (4)
where 119898 is the number of training samples Y is the desiredoutput y(119894)
119895is the 119895th value of the actual output y(119894) (see (3)) of
the 119894th training sample and is a vector whose size is 1198995 In the
desired output 119884(119894) of the 119894th sample the probability value ofthe labeled class is 1 and the probability values of other classesare 0 1119895 = Y(119894)means if 119895 is equal to the desired label of the119894th training sample its value is 1 otherwise its value is 0 Weadd a minus sign to the front of 119869(120579) in order to make thecomputation more convenient
The derivative of the loss function with respect to u119894is
120575119894=
120597119869
120597u119894
=
minus (Y minus y) ∘ 1198911015840 (u119894) 119894 = 119871
(W119879119894120575119894+1) ∘ 1198911015840(u119894) 119894 lt 119871
(5)
where ∘ denotes element-wise multiplication 1198911015840(u119894) can be
easily represented as
1198911015840(u119894) =
(1 minus 119891 (u119894)) ∘ (1 + 119891 (u
119894)) 119894 = 1 3
null 119894 = 2
119891 (u119894) ∘ (1 minus 119891 (u
119894)) 119894 = 4
(6)
Journal of Sensors 5
⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899
1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899
119894
layerneurons = new Neuron [119899119894+1]
layerparams = 120579119894
modeladdLayer(layer)end forreturnmodel
end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899
2 1198993 1198994 1198961 1198962 according to 119899
1and 119899
5
Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899
1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)
Update 120579 using (7)err = err + mean(119869(120579))
end forerr = errBATCHEStrainingiter++
end whileSave parameters 120579 of the CNN
Algorithm 1 Our CNN-based method
Therefore on each iteration we would perform the update
120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)
for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and
nabla120579119869 (120579) =
120597119869
1205971205791
120597119869
1205971205792
120597119869
120597120579119871
(8)
We know that 120579119894containsW
119894and b
119894 and
120597119869
120597120579119894
= 120597119869
120597W119894
120597119869
120597b119894
(9)
where
120597119869
120597W119894
=120597119869
120597u119894
∘120597u119894
120597W119894
=120597119869
120597u119894
∘ x119894= 120575119894∘ x119894
120597119869
120597b119894
=120597119869
120597u119894
∘120597u119894
120597b119894
=120597119869
120597u119894
= 120575119894
(10)
With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1
34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)
4 Experiments
All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC
6 Journal of Sensors
Table 1 Number of training and test samples used in the IndianPines data set
Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065
Total 1600 6904
equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card
41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]
uniformlyThe Indian Pines data set was gathered by Airborne
VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899
1= 220 119896
1= 24 119899
2= 197 119896
2= 5 119899
3= 40 119899
4=
100 and 1198995= 8 and the number of total trainable parameters
in the data set is 81408The second data employed was also collected by the
AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899
1= 224 119896
1= 24
1198992= 201 119896
2= 5 119899
3= 40 119899
4= 100 and 119899
5= 16 and the
number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the
Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340
pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water
Table 2 Number of training and test samples used in the Salinasdata set
Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607
Total 3200 50929
Table 3 Number of training and test samples used in University ofPavia data set
Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747
Total 1800 40976
band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896
1= 11 119899
2= 93 119896
2= 3 119899
3= 30 119899
4= 100 and
1198995= 9 and the number of total trainable parameters in the
data set is 61249
42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 5
⊳ Constructing the CNNModelfunction INITCNNMODEL (120579 [119899
1ndash5])layerType = [convolution max-pooling fully-connected fully-connected]layerActivation = [tanh() max() tanh() softmax()]model = new Model()for 119894 = 1 to 4 dolayer = new Layer()layertype = layerType[119894]layerinputSize = 119899
119894
layerneurons = new Neuron [119899119894+1]
layerparams = 120579119894
modeladdLayer(layer)end forreturnmodel
end function⊳ Training the CNNModelInitialize learning rate 120572 number of max iteration ITERmax min error ERRmin trainingbatchs BATCHEStraining bach size SIZEbatch and so onCompute 119899
2 1198993 1198994 1198961 1198962 according to 119899
1and 119899
5
Generate random weights 120579 of the CNNcnnModel = InitCNNModel(120579 [119899
1ndash5])iter = 0 err = +infwhile err gt ERRmin and iter lt ITERmax doerr = 0for bach = 1 to BATCHEStraining do[nabla120579119869(120579) 119869(120579)] = cnnModeltrain (TrainingDatas TrainingLabels) as (4) and (8)
Update 120579 using (7)err = err + mean(119869(120579))
end forerr = errBATCHEStrainingiter++
end whileSave parameters 120579 of the CNN
Algorithm 1 Our CNN-based method
Therefore on each iteration we would perform the update
120579 = 120579 minus 120572 sdot nabla120579119869 (120579) (7)
for adjusting the trainable parameters where 120572 is the learningfactor (120572 = 001 in our implementation) and
nabla120579119869 (120579) =
120597119869
1205971205791
120597119869
1205971205792
120597119869
120597120579119871
(8)
We know that 120579119894containsW
119894and b
119894 and
120597119869
120597120579119894
= 120597119869
120597W119894
120597119869
120597b119894
(9)
where
120597119869
120597W119894
=120597119869
120597u119894
∘120597u119894
120597W119894
=120597119869
120597u119894
∘ x119894= 120575119894∘ x119894
120597119869
120597b119894
=120597119869
120597u119894
∘120597u119894
120597b119894
=120597119869
120597u119894
= 120575119894
(10)
With an increasing number of training iteration thereturn of the cost function is smaller which indicates thatthe actual output is closer to the desired outputThe iterationstops when the discrepancy between them is small enoughWe use average sum of squares to represent the discrepancyFinally the trained CNN is ready for HSI classification Thesummary of the proposed algorithm is shown in Algorithm 1
34 Classification Since the architecture and all correspond-ing trainable parameters are specified we can build theCNN classifier and reload saved parameters for classifyingHSI data The classification process is just like the forwardpropagation step in which we can compute the classificationresult as (3)
4 Experiments
All the programs are implemented using Python languageand Theano [30] library Theano is a Python library thatmakes us easily define optimize and evaluate mathematicalexpressions involvingmultidimensional arrays efficiently andconveniently on GPUs The results are generated on a PC
6 Journal of Sensors
Table 1 Number of training and test samples used in the IndianPines data set
Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065
Total 1600 6904
equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card
41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]
uniformlyThe Indian Pines data set was gathered by Airborne
VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899
1= 220 119896
1= 24 119899
2= 197 119896
2= 5 119899
3= 40 119899
4=
100 and 1198995= 8 and the number of total trainable parameters
in the data set is 81408The second data employed was also collected by the
AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899
1= 224 119896
1= 24
1198992= 201 119896
2= 5 119899
3= 40 119899
4= 100 and 119899
5= 16 and the
number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the
Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340
pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water
Table 2 Number of training and test samples used in the Salinasdata set
Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607
Total 3200 50929
Table 3 Number of training and test samples used in University ofPavia data set
Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747
Total 1800 40976
band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896
1= 11 119899
2= 93 119896
2= 3 119899
3= 30 119899
4= 100 and
1198995= 9 and the number of total trainable parameters in the
data set is 61249
42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
6 Journal of Sensors
Table 1 Number of training and test samples used in the IndianPines data set
Number Class Training Test1 Corn-notill 200 12282 Corn-mintill 200 6303 Grass-pasture 200 2834 Hay-windrowed 200 2785 Soybean-notill 200 7726 Soybean-mintill 200 22557 Soybean-clean 200 3938 Woods 200 1065
Total 1600 6904
equipped with an Intel Core i7 with 28GHz and NvidiaGeForce GTX 465 graphics card
41TheData Sets Threehyperspectral data including IndianPines Salinas and University of Pavia scenes are employedto evaluate the effectiveness of the proposed method Forall the data we randomly select 200 labeled pixels per classfor training and all other pixels in the ground truth map fortest Development data are derived from the available trainingdata by further dividing them into training and testingsamples for tuning the parameters of the proposed CNNclassifier Furthermore each pixel is scaled to [minus10 +10]
uniformlyThe Indian Pines data set was gathered by Airborne
VisibleInfrared Imaging Spectrometer (AVIRIS) sensor innorthwestern Indiana There are 220 spectral channels in the04 to 245 120583m region of the visible and infrared spectrumwith a spatial resolution of 20m From the statistical view-point we discard some classes which only have few labeledsamples and select 8 classes for which the numbers of trainingand testing samples are listed in Table 1The layer parametersof this data set in the proposed CNN classifier are set asfollows 119899
1= 220 119896
1= 24 119899
2= 197 119896
2= 5 119899
3= 40 119899
4=
100 and 1198995= 8 and the number of total trainable parameters
in the data set is 81408The second data employed was also collected by the
AVIRIS sensor capturing an area over Salinas ValleyCalifornia with a spatial resolution of 37m The imagecomprises 512times217 pixels with 220 bands It mainly containsvegetables bare soils and vineyard fields (httpwwwehuesccwintcoindexphpHyperspectral Remote Sensing Scenes)There are also 16 different classes and the numbers of trainingand testing samples are listed in Table 2The layer parametersof this data set in our CNN are set to be 119899
1= 224 119896
1= 24
1198992= 201 119896
2= 5 119899
3= 40 119899
4= 100 and 119899
5= 16 and the
number of total trainable parameters in the data set is 82216The University of Pavia data set was collected by the
Reflective Optics System Imaging Spectrometer (ROSIS)sensor The image scene with a spatial coverage of 610 times 340
pixels covering the city of Pavia Italy was collected underthe HySens project managed by DLR (the German AerospaceAgency) The data set has 103 spectral bands prior to water
Table 2 Number of training and test samples used in the Salinasdata set
Number Class Training Test1 Broccoli green weeds 1 200 18092 Broccoli green weeds 2 200 35263 Fallow 200 17764 Fallow rough plow 200 11945 Fallow smooth 200 24786 Stubble 200 37597 Celery 200 33798 Grapes untrained 200 110719 Soil vineyard develop 200 600310 Corn senesced green weeds 200 307811 Lettuce romaine 4 wk 200 86812 Lettuce romaine 5 wk 200 172713 Lettuce romaine 6 wk 200 71614 Lettuce romaine 7 wk 200 87015 Vineyard untrained 200 706816 Vineyard vertical trellis 200 1607
Total 3200 50929
Table 3 Number of training and test samples used in University ofPavia data set
Number Class Training Test1 Asphalt 200 64312 Meadows 200 184493 Gravel 200 18994 Trees 200 28645 Sheets 200 11456 Bare soil 200 48297 Bitumen 200 11308 Bricks 200 34829 Shadows 200 747
Total 1800 40976
band removal It has a spectral coverage from 043 to 086 120583mand a spatial resolution of 13m Approximately 42776 labeledpixels with 9 classes are from the ground truth map and thenumbers of training and testing samples are shown in Table 3The layer parameters of this data set in our CNN are set to be1198991= 103 119896
1= 11 119899
2= 93 119896
2= 3 119899
3= 30 119899
4= 100 and
1198995= 9 and the number of total trainable parameters in the
data set is 61249
42 Results and Comparisons Table 4 provides the com-parison of classification performance between the pro-posed method and the traditional SVM classifier SVMwith RBF kernel is implemented using the libsvm package(httpwwwcsientuedutwsimcjlinlibsvm) cross validationis also employed to determine the related parameters andall optimal ones are used in following experiments It isobvious that our proposed method has better performance
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 7
Corn-notill Corn-mintill Grass-pastureSoybean-notill Soybean-mintill Soybean-clean Woods
Hay-windrowed
Figure 4 RGB composition maps resulting from classification for the Indian Pines data set From left to right ground truth RBF-SVM andthe proposed method
FallowFallow rough plowFallow smoothStubbleCeleryGrapes untrainedSoil vineyard developCorn senesced green weeds
Vineyard untrainedVineyard vertical trellis
Broccoli green weeds 1Broccoli green weeds 2
Lettuce romaine 4wkLettuce romaine 5wkLettuce romaine 6wkLettuce romaine 7wk
Figure 5 RGB composition maps resulting from classification for the Salinas data set From left to right ground truth RBF-SVM and theproposed method
Asphalt Meadows Gravel Trees SheetsBare soil Bitumen Bricks Shadows
Figure 6Thematicmaps resulting from classification forUniversityof Pavia data set From left to right ground truth RBF-SVM and theproposed method
(approximate 2 gain) than SVM classifier using all the threedata sets Figures 4 5 and 6 illustrate the correspondingclassification maps obtained with our proposed methodand RBF-SVM classifier Furthermore compared with RBF-SVM the proposed CNN classifier has higher classificationaccuracy not only for the overall data set but also for almostall the specific classes as shown in Figure 7
Figure 8 further illustrates the relationship between clas-sification accuracies and the training time (the test time is
Table 4 Comparison of results between the proposed CNN andRBF-SVM using three data sets
Data set The proposed CNN RBF-SVMIndian Pines 9016 8760Salinas 9260 9166University of Pavia 9256 9052
Table 5 Results of comparison with different neural networks onthe Indian Pines data set
Method Training time Testing time AccuracyTwo-layer NN 2800 s 165 s 8649DNN 6500 s 321 s 8793LeNet-5 5100 s 234 s 8827Our CNN 4300 s 198 s 9016
also included) for three experimental data sets With theincreasing of training time the classification accuracy ofeach data can reach over 90 We must admit that thetraining process is relatively time-consuming to achieve goodperformance however the proposed CNN classifier sharesthe same advantages (eg fast on testing) of deep learningalgorithms (see Table 5) Moreover our implementation ofCNN could be improved greatly on efficiency or we can useother CNN frameworks such as Caffe [31] to reduce training
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
8 Journal of Sensors
75
80
85
90
95
100
1 2 3 4 5 6 7 8
Accu
racy
()
Class number
CNNSVM
(a)
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Accu
racy
()
Class number
CNNSVM
(b)
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9
Accu
racy
()
Class number
CNNSVM
(c)
Figure 7 Classification accuracies of all the classes for experimental data sets From (a) to (c) Indian Pines Salinas and University of PaviaThe class number is corresponding to the first column in Tables 1 2 and 3
and test time According to our experiments it takes only 5minutes to achieve 90 accuracy on MNIST dataset [32] byusing Caffe compared to more than 120 minutes by using ourimplemented framework
Figure 9 illustrates the relationship between cost value(see (4)) and the training time for the University of Paviadata set The value of the loss function is reduced with anincreasing number of training iteration which demonstrates
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 9
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70
Ove
rall
accu
racy
()
Training time (min)
(a)
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90
Ove
rall
accu
racy
()
Training time (min)
(b)
0
10
20
30
40
50
60
70
80
90
100
0 1 2 3 4 5 6 7 8 9
Ove
rall
accu
racy
()
Training time (min)
(c)
Figure 8 Classification accuracies versus the training time for experimental data sets From (a) to (c) Indian Pines Salinas and Universityof Pavia Note that the test time is also included in the training time
the convergence of our network with only 200 trainingsamples for each classMoreover the cost value is still reducedafter 5-minute training but the corresponding test accuracyis relatively stable (see Figure 8(a)) which indicates theoverfitting problem in this network
To further verify that the proposed classifier is suitablefor classifying data sets with limited training samples we alsocompare our CNN with RBF-SVM under different trainingsizes on the University of Pavia data set as shown in Figure 10It is obvious that our proposed CNN consistently provides
higher accuracy than SVM However although the conven-tional deep learning-based method [26] can outperform theSVM classifier it requires plenty of training samples forconstructing autoencoders
To demonstrate the relationship between classificationaccuracies and the visual differences of curve shapes (seeFigure 2) we present the detailed accuracies of our proposedCNN classifiers for the University of Pavia data set in Table 6In the table the cell in the 119894th row 119895th column means thepercentage of the 119894th class samples (according to ground
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
10 Journal of Sensors
Table 6 The detailed classification accuracies of all the classes for University of Pavia data set
Asphalt Meadows Gravel Trees Sheets Bare soil Bitumen Bricks ShadowsAsphalt 8734 026 232 000 019 037 625 325 002Meadows 000 9463 002 126 000 403 000 006 000Gravel 053 047 8647 000 000 000 005 1243 005Trees 000 267 000 9629 003 101 000 000 000Sheets 000 009 000 000 9965 026 000 000 000Bare soil 012 615 000 010 008 9323 000 031 000Bitumen 637 000 035 000 009 000 9319 000 000Bricks 190 020 1048 000 006 060 034 8642 000Shadows 000 000 000 000 000 000 000 000 10000
0
02
04
06
08
1
12
14
16
18
2
0 1 2 3 4 5 6 7 8 9
Cos
t val
ue
Training time (min)
Figure 9 Cost value versus the training time for University of Paviadata sets
truth) which is classified to the 119895th class For example 8734of class Asphalt samples are classified correctly but 625 ofclass Asphalt samples are wrongly classified to class BitumenThe percentages on diagonal line are just the classificationaccuracies of corresponding classes As for one class themore unique the corresponding curve shape is the higheraccuracy the proposed CNN classifier can achieve (checkthe class Shadow and class Sheets in Figure 2 and Table 6)The more similar two curves are the higher opportunitythey are wrongly classified to each other (check the classGravel and class Bricks in Figure 2 and Table 6) Further-more the excellent performance verifies that the proposedCNN classifier has discriminative capability to extract subtlevisual features which is even superior to human vision forclassifying complex curve shapes
Finally we also implement three other types of neuralnetwork architectures for the Indian Pines data set usingthe same training and test samples The first one is a simplearchitecture with only two fully connected layers beside theinput layerThe second one is LeNet-5 which is a classic CNNarchitecture with two convolutional layers The third one is
83
84
85
86
87
88
89
90
91
92
93
94
60 80 100 120 140 160 180 200
Ove
rall
accu
racy
()
Number of training samples of each class
SVMCNN
Figure 10 Classification accuracies versus numbers of trainingsamples (each class) for the University of Pavia data sets
a conventional deep neural networks (DNNs) with 3 hiddenfully connected layers (a 220-60-40-20-8 architecture as sug-gested in [26])The classification performance is summarizedin Table 5 From Table 5 we can see that our CNN classifierachieves the highest accuracy with competitive training andtesting computational cost LeNet-5 and DNNs cost moretime to train models due to their complex architecturebut limited training samples restrict their capabilities ofclassification (only 20 samples selected for testing in [26]compared with 95 in our experiment) Another reason forthe difficulty that deeper CNNs and DNNs face to achievehigher accuracies could be that the HSI lacks the type ofhigh frequency signal commonly seen in the computer visiondomain (see Figure 2)
5 Conclusion and Future Work
In this paper we proposed a novel CNN-based methodfor HSI classification inspired by our observation thatHSI classification can be implemented via human vision
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
Journal of Sensors 11
Compared with SVM-based classifier and conventionalDNN-based classifier the proposed method could achievehigher accuracy using all the experimental data sets evenwith a small number of training samples
Our work is an exploration of using CNNs for HSIclassification and has excellent performanceThe architectureof our proposed CNN classifier only contains one convo-lutional layer and one fully connected layer due to thesmall number of training samples In the future a networkarchitecture called a Siamese Network [33] might be usedwhich has been proved to be robust in the situation wherethe number of training samples per category is small Sometechniques such as Dropout [34] can also be used toalleviate the overfitting problem caused by limited trainingsamples Furthermore recent researches in deep learninghave indicated that unsupervised learning can be employedto train CNNs reducing the requirement of labeled samplessignificantly Deep learning especially deep CNNs shouldhave great potentiality for HSI classification in the futureMoreover in the current work we do not consider the spatialcorrelation and only concentrate on the spectral signaturesWe believe that some spatial-spectral techniques also can beapplied to further improve the CNN-based classification Atlast we plan to employ efficient deep CNN frameworks suchas Caffe to improve our computing performance
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported jointly by the National NaturalScience Foundation of China (nos 61371165 and 61302164)the 973 Program of China (no 2011CB706900) the Programfor New Century Excellent Talents in University underGrant no NCET-11-0711 and the Interdisciplinary ResearchProject in Beijing University of Chemical Technology WeiHu and Fan Zhang are also supported by the Beijing HigherEducation Young Elite Teacher Project under Grant nosYETP0501 and YETP0500 respectively
References
[1] D Landgrebe ldquoHyperspectral image data analysisrdquo IEEE SignalProcessing Magazine vol 19 no 1 pp 17ndash28 2002
[2] G M Foody and A Mathur ldquoA relative evaluation of multiclassimage classification by support vectormachinesrdquo IEEE Transac-tions on Geoscience and Remote Sensing vol 42 no 6 pp 1335ndash1343 2004
[3] Y Tarabalka J A Benediktsson and J Chanussot ldquoSpectral-spatial classification of hyperspectral imagery based on parti-tional clustering techniquesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 47 no 8 pp 2973ndash2987 2009
[4] W Li S Prasad F James and B Lour ldquoLocality-preservingdimensionality reduction and classification for hyperspectralimage analysisrdquo IEEE Transactions on Geoscience and RemoteSensing vol 50 no 4 pp 1185ndash1198 2012
[5] F Melgani and L Bruzzone ldquoClassification of hyperspectralremote sensing images with support vector machinesrdquo IEEETransactions on Geoscience and Remote Sensing vol 42 no 8pp 1778ndash1790 2004
[6] J A Gualtieri and S Chettri ldquoSupport vector machines for clas-sification of hyperspectral datardquo inProceedings of the IEEE Inter-national Geoscience and Remote Sensing Symposium (IGARSSrsquo00) vol 2 pp 813ndash815 IEEE July 2000
[7] G Mountrakis J Im and C Ogole ldquoSupport vector machinesin remote sensing a reviewrdquo ISPRS Journal of Photogrammetryand Remote Sensing vol 66 no 3 pp 247ndash259 2011
[8] J Li J M Bioucas-Dias and A Plaza ldquoSpectral-spatial classifi-cation of hyperspectral data using loopy belief propagation andactive learningrdquo IEEE Transactions on Geoscience and RemoteSensing vol 51 no 2 pp 844ndash856 2013
[9] P M Atkinson and A R L Tatnall ldquoIntroduction neuralnetworks in remote sensingrdquo International Journal of RemoteSensing vol 18 no 4 pp 699ndash709 1997
[10] L Bruzzone and D F Prieto ldquoA technique for the selectionof kernel-function parameters in RBF neural networks forclassification of remote-sensing imagesrdquo IEEE Transactions onGeoscience andRemote Sensing vol 37 no 2 pp 1179ndash1184 1999
[11] F Ratle G Camps-Valls and JWeston ldquoSemisupervised neuralnetworks for efficient hyperspectral image classificationrdquo IEEETransactions on Geoscience and Remote Sensing vol 48 no 5pp 2271ndash2282 2010
[12] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo Science vol 313 no5786 pp 504ndash507 2006
[13] K Fukushima ldquoNeocognitron a hierarchical neural networkcapable of visual pattern recognitionrdquo Neural Networks vol 1no 2 pp 119ndash130 1988
[14] Y LeCun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998
[15] D C Ciresan U Meier J Masci L M Gambardella andJ Schmidhuber ldquoFlexible high performance convolutionalneural networks for image classificationrdquo in Proceedings of the22nd International Joint Conference on Artificial Intelligence(IJCAI rsquo11) vol 22 pp 1237ndash1242 July 2011
[16] P Y Simard D Steinkraus and J C Platt ldquoBest practicesfor convolutional neural networks applied to visual documentanalysisrdquo in Proceedings of the 7th International Conference onDocument Analysis and Recognition vol 2 pp 958ndash963 IEEEComputer Society Edinburgh UK August 2003
[17] P Sermanet and Y LeCun ldquoTraffic sign recognition with multi-scale convolutional networksrdquo in Proceedings of the Interna-tional Joint Conference onNeuralNetwork (IJCNN rsquo11) pp 2809ndash2813 IEEE August 2011
[18] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the Advances in Neural Information Processing Systems 25(NIPS 12) pp 1097ndash1105 2012
[19] D Ciregan U Meier and J Schmidhuber ldquoMulti-column deepneural networks for image classificationrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 3642ndash3649 IEEE June 2012
[20] R Girshick J Donahue T Darrell and J Malik ldquoRich featurehierarchies for accurate object detection and semantic segmen-tationrdquo in Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 IEEEJune 2014
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
12 Journal of Sensors
[21] C Farabet C Couprie L Najman and Y LeCun ldquoLearninghierarchical features for scene labelingrdquo IEEE Transactions onPattern Analysis and Machine Intelligence vol 35 no 8 pp1915ndash1929 2013
[22] P Sermanet S Chintala and Y LeCun ldquoConvolutional neuralnetworks applied to house numbers digit classificationrdquo inProceedings of the 21st International Conference on PatternRecognition (ICPR rsquo12) pp 3288ndash3291 IEEE November 2012
[23] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquoin Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 Columbus OhioUSA June 2014
[24] T N Sainath A-R Mohamed B Kingsbury and B Ram-abhadran ldquoDeep convolutional neural networks for LVCSRrdquoin Proceedings of the 38th IEEE International Conference onAcoustics Speech and Signal Processing (ICASSP rsquo13) pp 8614ndash8618 IEEE Vancouver Canada May 2013
[25] O Abdel-Hamid A-R Mohamed H Jiang and G PennldquoApplying convolutional neural networks concepts to hybridNN-HMMmodel for speech recognitionrdquo in Proceedings of theIEEE International Conference on Acoustics Speech and SignalProcessing (ICASSP rsquo12) pp 4277ndash4280 IEEE March 2012
[26] Y Chen Z Lin X Zhao G Wang and Y Gu ldquoDeep learning-based classification of hyperspectral datardquo IEEE Journal ofSelected Topics in Applied Earth Observations and RemoteSensing vol 7 no 6 pp 2094ndash2107 2014
[27] I Sutskever and G E Hinton ldquoDeep narrow sigmoid beliefnetworks are universal approximatorsrdquo Neural Computationvol 20 no 11 pp 2629ndash2636 2008
[28] D H Hubel and T N Wiesel ldquoReceptive fields and functionalarchitecture ofmonkey striate cortexrdquoThe Journal of Physiologyvol 195 no 1 pp 215ndash243 1968
[29] Y A LeCun L Bottou G B Orr and K-R Muller ldquoEfficientbackproprdquo in Neural Networks Tricks of the Trade pp 9ndash48Springer Berlin Germany 2012
[30] J Bergstra F Bastien O Breuleux et al ldquoTheano deep learningon GPUs with pythonrdquo in Proceedings of the NIPS 2011 BigLearning Workshop pp 712ndash721 Granada Spain December2011
[31] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014
[32] Y LeCun C Cortes and C J Burges ldquoThe MNIST database ofhandwritten digitsrdquo 1998 httpyannlecuncomexdbmnist
[33] S Chopra R Hadsell and Y LeCun ldquoLearning a similaritymetric discriminatively with application to face verificationrdquoin Proceedings of the IEEE Computer Society Conference onComputer Vision and Pattern Recognition (CVPR rsquo05) vol 1 pp539ndash546 IEEE June 2005
[34] N Srivastava G Hinton A Krizhevsky I Sutskever and RSalakhutdinov ldquoDropout a simple way to prevent neural net-works from overfittingrdquo The Journal of Machine LearningResearch vol 15 no 1 pp 1929ndash1958 2014
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
International Journal of
AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Active and Passive Electronic Components
Control Scienceand Engineering
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
RotatingMachinery
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation httpwwwhindawicom
Journal ofEngineeringVolume 2014
Submit your manuscripts athttpwwwhindawicom
VLSI Design
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Shock and Vibration
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawi Publishing Corporation httpwwwhindawicom
Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
SensorsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Navigation and Observation
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
DistributedSensor Networks
International Journal of
top related