cascade convolutional neural network based on transfer...

15
Research Article Cascade Convolutional Neural Network Based on Transfer-Learning for Aircraft Detection on High-Resolution Remote Sensing Images Bin Pan, 1 Jianhao Tai, 1 Qi Zheng, 2 and Shanshan Zhao 1 1 School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, Hubei, China 2 School of Computer Science, Wuhan University, Wuhan, Hubei, China Correspondence should be addressed to Jianhao Tai; [email protected] Received 7 March 2017; Revised 1 June 2017; Accepted 11 June 2017; Published 27 July 2017 Academic Editor: Mar´ ıa Guijarro Copyright © 2017 Bin Pan et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Aircraſt detection from high-resolution remote sensing images is important for civil and military applications. Recently, detection methods based on deep learning have rapidly advanced. However, they require numerous samples to train the detection model and cannot be directly used to efficiently handle large-area remote sensing images. A weakly supervised learning method (WSLM) can detect a target with few samples. However, it cannot extract an adequate number of features, and the detection accuracy requires improvement. We propose a cascade convolutional neural network (CCNN) framework based on transfer-learning and geometric feature constraints (GFC) for aircraſt detection. It achieves high accuracy and efficient detection with relatively few samples. A high-accuracy detection model is first obtained using transfer-learning to fine-tune pretrained models with few samples. en, a GFC region proposal filtering method improves detection efficiency. e CCNN framework completes the aircraſt detection for large-area remote sensing images. e framework first-level network is an image classifier, which filters the entire image, excluding most areas with no aircraſt. e second-level network is an object detector, which rapidly detects aircraſt from the first-level network output. Compared with WSLM, detection accuracy increased by 3.66%, false detection decreased by 64%, and missed detection decreased by 23.1%. 1. Introduction Aircraſt detection from remote sensing images is a type of small target recognition under a wide range. It has two prob- lems, however, one is the efficiency of large-area image detec- tion; the other is the aircraſt feature extraction and expression in complex environments. Nevertheless, the increase of high- resolution remote sensing images has advanced research in this area [1]. e key point of target detection is to find the stable target feature. Traditional aircraſt detection methods mainly focus on the feature description, feature selection, feature extraction, and other algorithms [2–5]. Adjustment and optimization of the algorithm can improve the detection accuracy and efficiency [6]. However, these features are common image attributes; thus, it is difficult to fundamentally distinguish between the target and background. Moreover, in a complex environment, the method accuracy is poor, such as that of the histogram of oriented gradient (HOG) [7] and scale-invariant feature transform (SIFT) [3]. Some studies matched the test image by designing a standard sample of the target [2, 8]. However, this method only applies to special scenes; it is not very versatile. In practice, the multifeature fusion method is oſten used to comprehensively describe the target [9]. Nonetheless, it increases the algorithm complexity and reduces the detection efficiency to some extent [7, 10]. e deep learning method has a strong feature-extraction ability. rough a multilayer neural network and a large number of samples, it extracts the multilevel features of objects. e method has advanced considerably in the field of natural image processing [11, 12]. Furthermore, numerous high-performance algorithms [13, 14], such as the Region Convolutional Neural Network (R-CNN) and Fast and Faster R-CNN [15, 16], have been proposed. However, these methods require many samples to train the network model. When Hindawi Journal of Sensors Volume 2017, Article ID 1796728, 14 pages https://doi.org/10.1155/2017/1796728

Upload: others

Post on 05-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Research ArticleCascade Convolutional Neural Network Based onTransfer-Learning for Aircraft Detection on High-ResolutionRemote Sensing Images

Bin Pan1 Jianhao Tai1 Qi Zheng2 and Shanshan Zhao1

1School of Remote Sensing and Information Engineering Wuhan University Wuhan Hubei China2School of Computer Science Wuhan University Wuhan Hubei China

Correspondence should be addressed to Jianhao Tai taijianhaowhueducn

Received 7 March 2017 Revised 1 June 2017 Accepted 11 June 2017 Published 27 July 2017

Academic Editor Marıa Guijarro

Copyright copy 2017 Bin Pan et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Aircraft detection from high-resolution remote sensing images is important for civil and military applications Recently detectionmethods based on deep learning have rapidly advanced However they require numerous samples to train the detection model andcannot be directly used to efficiently handle large-area remote sensing images A weakly supervised learning method (WSLM) candetect a target with few samples However it cannot extract an adequate number of features and the detection accuracy requiresimprovement We propose a cascade convolutional neural network (CCNN) framework based on transfer-learning and geometricfeature constraints (GFC) for aircraft detection It achieves high accuracy and efficient detection with relatively few samples Ahigh-accuracy detection model is first obtained using transfer-learning to fine-tune pretrained models with few samples Then aGFC region proposal filtering method improves detection efficiency The CCNN framework completes the aircraft detection forlarge-area remote sensing imagesThe framework first-level network is an image classifier which filters the entire image excludingmost areas with no aircraftThe second-level network is an object detector which rapidly detects aircraft from the first-level networkoutput Compared with WSLM detection accuracy increased by 366 false detection decreased by 64 and missed detectiondecreased by 231

1 Introduction

Aircraft detection from remote sensing images is a type ofsmall target recognition under a wide range It has two prob-lems however one is the efficiency of large-area image detec-tion the other is the aircraft feature extraction and expressionin complex environments Nevertheless the increase of high-resolution remote sensing images has advanced research inthis area [1] The key point of target detection is to find thestable target feature Traditional aircraft detection methodsmainly focus on the feature description feature selectionfeature extraction and other algorithms [2ndash5] Adjustmentand optimization of the algorithm can improve the detectionaccuracy and efficiency [6] However these features arecommon image attributes thus it is difficult to fundamentallydistinguish between the target and background Moreover ina complex environment the method accuracy is poor such

as that of the histogram of oriented gradient (HOG) [7] andscale-invariant feature transform (SIFT) [3] Some studiesmatched the test image by designing a standard sample ofthe target [2 8] However this method only applies to specialscenes it is not very versatile In practice the multifeaturefusion method is often used to comprehensively describe thetarget [9] Nonetheless it increases the algorithm complexityand reduces the detection efficiency to some extent [7 10]

The deep learningmethod has a strong feature-extractionability Through a multilayer neural network and a largenumber of samples it extracts the multilevel features ofobjects The method has advanced considerably in the fieldof natural image processing [11 12] Furthermore numeroushigh-performance algorithms [13 14] such as the RegionConvolutional Neural Network (R-CNN) and Fast and FasterR-CNN[15 16] have beenproposedHowever thesemethodsrequire many samples to train the network model When

HindawiJournal of SensorsVolume 2017 Article ID 1796728 14 pageshttpsdoiorg10115520171796728

2 Journal of Sensors

processing a large-area image many invalid region proposalsmust be detected and this is inefficient Some target detectionmethods using remote sensing images based on deep learninghave been presented Weakly supervised and semisupervisedfeature learning methods have been proposed [17ndash19] Zhanget al proposed an aircraft detection method based on weaklysupervised learning coupled with a CNN [20] Han etal proposed an iterative detector approach [21] by whichthe detector is iteratively trained using refined annotationsuntil the model converges Nevertheless a nonconvergencesituation may occur which reduces the detection accuracyWeakly supervised learning extracts features with a smallamount of sample data However owing to a lack of samplesthe features are not sufficient Therefore the detection accu-racy is limited

There are three key problems in aircraft detection fromremote sensing images by the deep learning method Firstthe number of training samples is limited A means of usinga small number of samples to train a high-accuracy model isan important point Second aircraft in remote sensing imageshave obvious features and some stability features can beselected as constraint conditions Combining these featureswith deep learning the detection method can have a bettertargeting ability Third a remote sensing image covers a largearea its scale is not uniform and the sensor is miscellaneousThe existing deep learning model and network structure arenot directly suitable for remote sensing image aircraft detec-tion [22] The network structure and detection algorithmmust be modified to improve the efficiency and accuracy

In view of the above problems a cascade CNN (CCNN)architecture is proposed to improve the accuracy and effi-ciency It contains four main parts (1) A two-level CCNNis designed to rapidly process remote sensing image aircraftdetection The first-level network quickly classifies the sceneand eliminates the areas that do not contain aircraft Thesecond-level network identifies and locates aircraft in theareas not filtered out in the previous step (2) A transfer-learning method is used to fine-tune the parameters ofpretrained classification and object detection models withthe samples (3) A region proposal filtering method of ageometric feature constraint (GFC) based on the geometricfeatures of aircraft is proposed By using these features to filterthe region proposals numerous nonaircraft region proposalsare eliminated (4) An aircraft sample library and a remotesensing image-scene sample library are established They areused as the data source of transfer-learning

2 Related Work

In this section we review related work that has utilized deeplearning for scene classification and target detection Remotesensing scene classification plays a key role in a wide rangeof applications and has received remarkable attention fromresearchers Significant efforts have been made to developvaried datasets and present a variety of approaches to sceneclassification from remote sensing images However therehas yet to be a systematic review of the literature concerningdatasets and scene classificationmethods In addition almost

all existing datasets have several limitations including thesmall scale of scene classes and image numbers the lack ofimage variation and diversity and accuracy saturation Inresponse to this problem Cheng et al [23] proposed a large-scale dataset called ldquoNWPU-RESISC45rdquo which is a publiclyavailable benchmark for remote sensing image scene classifi-cation (RESISC) This dataset contains 31500 images cover-ing 45 scene classes with 700 images in each class which pro-vides strong support for RESISC Paisitkriangkrai et al [24]designed a multiresolution convolutional neural network(CNN) model for the semantic labeling of aerial imageryTo avoid overfitting from training a CNN with limited data[25] investigated the use of a CNNmodel pretrained on gen-eral computer vision datasets for classifying remote sensingscenesHowever theCNNmodel in [25] was not further fine-tuned and was directly used for land-use classification Yaoet al [26] proposed a unified annotation framework com-bining discriminative high-level feature learning and weaklysupervised feature transfer Specifically they first employ anefficient stacked discriminative sparse autoencoder (SDSAE)to learn high-level features on an auxiliary satellite image dataset for a land-use classification task

Another way to improve the efficiency of aircraft detec-tion is to first detect the airport range and then detect theaircraft within the airport Airport detection from remotesensing images has gained increasing research attention[21 27 28] in recent years due to its strategic importance forboth civil and military applications however it is a challeng-ing problem because of variations in airport appearance andthe presence of complex cluttered backgrounds and varia-tions in satellite image resolution Yao et al [29] proposed ahierarchical detectionmodel with a coarse and a fine layer Atthe coarse layer a target-oriented saliency model is built bycombining contrast and line density cues to rapidly localizeairport candidate areas At the fine layer a learned conditionrandom field (CRF) model is applied to each candidate areato perform fine detection of the airport target CNNs havedemonstrated their strong feature representation power forcomputer vision Despite the progress made in natural sceneimages it is problematic to directly use CNN features forobject detection in optical remote sensing images becauseit is difficult to effectively manage the problem of variationsin object rotation variation To address this problem Chenget al [30] proposed a novel and effective approach to learninga rotation-invariant CNN (RICNN) model that advancesobject detection performance which is achieved by intro-ducing and learning a new rotation-invariant layer based onexisting CNN architectures Cosaliency detection the goalof which is to discover the common and salient objects in agiven image group has received tremendous research interestin recent years However most existing cosaliency detectionmethods assume that all images in an image group shouldcontain cosalient objects in only one category and are thusimpractical for large-scale image sets obtained from theInternet Yao et al [31] improved cosaliency detection andadvanced its developmentTheir results can outperform state-of-the-art cosaliency detection methods performed on man-ually separated image subgroups

Journal of Sensors 3

Delete

CNN

RPN GFC

Slidingwindow

Nextdetect

Faster R-CNN

VGG16

Subsampling Classification

RemoveduplicateRegion

proposalsVGG16

Figure 1 Architecture of two-level CCNN for large-area remote sensing rapid aircraft detection The target detection process consists of twoparts First the image is downsampledThe first-level CNN structure is used to classify the scene with a sliding window on the downsampledimage block-by-block The nontarget window image is excluded and the index position of the target window is reserved and passed to thenext level of the network The second level receives the target window and performs aircraft target detection on the original image An RPNis used to get region proposals and the GFC model is used to filter multiple region proposals and exclude the region proposals that do notsatisfy the geometric characteristics of aircraftThen the Faster R-CNN classificationmodel is used to classify the remaining region proposalsto generate the target area Finally using overlap constraints delete the redundant target area to get the final detection results

3 Methods

31 Cascade Network for Aircraft Detection In general theimage used for aircraft detection includes the entire airportand surrounding areas most image areas do not containaircraft This situation reduces the detection accuracy andefficiency We propose a two-level CCNN method thatcombines the trained models and GFC method to solve thisproblem As shown in Figure 1 the first level is a coarsefilter A new image with lower resolution is obtained throughtwofold downsampling of the original image The new imageis traversed using a sliding window and the CNN imageclassifier is used to classify each sliding window Most of thenegative areas are filtered out and the positive area is fed intothe second-level network for aircraft detection

The second-level network is a target detector In this levelFaster R-CNN is used to perform aircraft detection for therespective area The output of the first-level network is theinput of the second-level network The corresponding regionon the original image is sent to the second network and thewindow size is 800 times 800 pixels The region proposals aregenerated by the region proposal network (RPN) The maindifference between Faster R-CNN and Fast R-CNN is thegeneration of the region proposal that is the former uses theefficient RPNmethod whereas the latter uses the less efficientselective search (SS) method In the RPN each image pro-duces approximately 20000 region proposals of three ratiosand three scales Then according to the region proposal clas-sification score the top 300 region proposals are selected asthe Faster R-CNN input for target detection More than 95of the 20000 region proposals are background Scoring andsorting such a large number of region proposals is obviously

time-intensive The GFC is used to delete the invalid regionproposals The final detection for the remaining regionproposals is performed by the aircraft detector

The general sliding window method uses a small sizewindow such as 12 times 12 24 times 24 and 48 times 48 Small windowsand small steps require considerable time to traverse theentire image which reduces the detection efficiency In anexperiment we used a large-size sliding window In the firstlevel the window size was 400 times 400 which was suitable forimage classification According to the training samples thedownsampled images had a higher classification accuracymoreover the speed of traversing the entire image wasimproved In the second level the window size was 800times 800The range of this window was equal to the downsamplingrange It required almost the same amount of time to detecta 400 times 400 image and an 800 times 800 image Thus it was notnecessary to decompose the original image into four 400 times400 smaller images The window size and sliding step sizewere determined by the image resolution It had to satisfy thecondition that at least one window could contain a completemaximum aircraft

Because of the overlap between adjacent windows someaircraft may be detected several times The repetitive detec-tion areas should be optimized The redundant objects aredeleted and only one optimal result is retained The regionsfor the same aircraft must meet the following conditions (1)Every region should satisfy the condition of GFC that is theaspect ratio width and height must meet the thresholds of(1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) respectively (2) The overlapbetween two regions should be larger than 30 as definedin (1) (3)The area is the largest as defined in (2) in whicharea(119903119894) means the area of the given region The process is

4 Journal of Sensors

(a) (b)

Figure 2 Process of deleting duplicate detected regions and overlapping regions between sliding windows (a) Result before deleting the duplicate(b) Result after deleting the duplicate

Classification

Source tasks

Detection

Knowledge base

VGG16 model

Classification

Target tasks

Detection

Learning system

Sceneclassification

Aircraftdetection

Training withsmall sample

VGG16 model

Figure 3 Structure of transfer-learning for scene classification and target detection Source tasks are the original deep learning methods ofclassification and detection and the knowledge base is comprised of the trained networksThe learning system involves transfer-learning andfine-tuning and the target tasks are remote sensing image scene classification and aircraft detection

shown in Figure 2 in which the two aircraft at the upperleft are in overlapping windows The incomplete regions aredeleted and the most appropriate one is preserved

area (119903119894 cap 119903119895) ge 30 (1)

Region = max area (119903119894) (2)

32 Transfer-Learning and Fine-Tuning theModel ParametersTransfer-learning refers to a model training method Theparameters of the pretrained network model are fine-tunedby a small number of samples to prompt the originalmodel toadapt to a new task [32 33] The deep learning object detec-tion method can achieve very high detection accuracy Onereason is that a large number of samples are used to train thedetectionmodel which contains a sufficient number of targetfeatures However for the remote sensing image no largesample library such as ImageNet exists that can be used totrain a high-accuracy model Moreover training a new high-accuracy object detectionmodel consumes considerable timeand computing resources Transfer-learning can solve theproblem of lacking samples and significantly shorten thetraining time [34]

In this paper the transfer-learning method is respec-tively used to fine-tune the pretrained image classificationmodel and object detection model As shown in Figure 3the source models are the respective classification and objectdetectionmodels Both include theVGG16 network structure[35] and are trained with ImageNet They were trained withthe samples given in Section 41 The parameters of the origi-nalmodel were fine-tuned by the learning system to completethe tasks of scene classification and aircraft detection

The CNN structure comprises many layers The bottomlayers are used to extract generic features such as structuretexture and color The top layers contain specific featuresassociated with the target detection task Using transfer-learning the top layer parameters are adjusted with a smallnumber of aircraft samples to change the target detectiontaskThrough a backpropagation algorithm the bottom-layerparameters are adjusted to extract aircraft features In generalwe use softmax as the supervisory loss function which isa classic cross-entropy classifier It is defined in (3) where119891(119909119894) is the activation of the previous layer node that pushedthrough a softmax function In (4) the target output 119910119894 is a1 minus 119870 vector and 119870 is the number of outputs In addition 119871is the number of layers and 120582 is a regularization term Thegoal is to minimize loss Finally stochastic gradient descent

Journal of Sensors 5

is used to fine-tune themodel and backpropagation is used tooptimize the function

119891 (119909119894) = softmax (119909119894) =119890119909119894sum119873119894=1 119890119909119894

(3)

loss = minus119898

sum119894=1

119910119894 log119891 (119909119894) + 120582119871

sum119897=1

sum (100381710038171003817100381711990811989710038171003817100381710038172) (4)

Another important parameter is the learning rate Theparameters of the pretrainedmodel are fine thus the networklayer learning rate should be set to a small numberThereforea small learning rate is used to ensure that both the features ofthe new sample can be learned and the excellent parametersof the pretrained model can be maintained Taking FasterR-CNN for example the parameters that must be set arethe following (1) By initializing the training network andselecting the pretrained model the ldquotrain_netrdquo is set toVGG16v2caffemodel (2) Set the number of categories Inthis paper all aircraft types are considered the same typethus there are only two categories aircraft and backgroundIn the data layer num_classes is set to two in the cls_scorelayer num_output is set to two and in the bbox_pred layernum_output is set to eight (3) In setting the learning ratethe base learning rate base_ly is set to 0001 The learningrate ly_mult of the cls_score layer and the bbox_pred layer isset to five and ten respectively The learning rates of otherconvolutional layers are not modified (4) In setting the num-ber of iterations of the backpropagation function max_itersis set to 40000

33 Geometric Feature Constraint In a remote sensing imagethe aircraft covers only a small area the remaining area is thebackground The region proposals are mostly backgroundnot aircraft Reducing the number of region proposals canmake object detection more targeted which helps improveefficiency [16] The special imaging method of the remotesensing image determines that only the top features of theaircraft can be viewed in the image [36] Thus the geometricfeatures of aircraft in the image have high stability [37]This feature can be exploited when designing a detectionmethod When the resolution of the image is fixed thegeometric parameters of each aircraft are constantThereforethe geometric features can be used as important conditionsfor aircraft detection

The geometric features mentioned in this paper mainlyrefer to external contour sizes of aircraft in remote sensingimages It is possible to use geometric signature constraintsfor detection because the following three conditions are metFirst conventional remote sensing imaging employs nadirimages that is the sensor gets an image from the sky that isperpendicular (or near-perpendicular) to the ground Thusthe top contour features of aircraft are seen in these remotesensing images these features (length width and aspectratio) are relatively stable constants Although the length andwidth are affected by image resolution they can be calculatedfrom the resolution of the image whereas the aspect ratiois not affected by resolution and is more fixed Second thelength width wingspan and body length ratio of different

types of aircraft are all within a certain rangeThese geometricfeatures give us an exact basis for calculating region propos-alsThird statistical methods are used to count the geometricfeatures of many aircraft samples Samples have been col-lected in theworldrsquosmajor countries from largemedium andsmall airports covering almost all types of aircraft includinglarge passenger aircraft fighters small private aircraft androtor helicopters These statistical results are used as areference for geometric feature constraints

The image resolution and the result of aircraft geometrystatistics are used to calculate the size of the sliding windowThe conditions that need to be met are as follows Theminimumoverlap between the windowsmust be greater thanthemaximum size of the aircraft to ensure that no aircraftwillbe missedThe number of aircraft in a sliding window is verysmall a window contains about 2000ndash3000 region proposalsbut the aircraft are probably in only a few The number oftarget region proposals is much smaller than the number ofnontarget region proposals and havingmany nontarget areasresults in increased overall time consumption With GFCsdeleting a candidate box that does not satisfy the conditioncan greatly improve the time efficiency of target detection

Taking the ground truth of the aircraft in the sample asan example their lengths and widths are within a fixed rangeThe original 2511 sample images were used to analyze theseparameters and the results are shown in Figure 4

Figure 4(a) shows that the aspect ratio of the circum-scribed rectangle is primarily around 10 and the range is(06 16) Figure 4(b) shows that the width and height of thecircumscribed rectangle are mainly around 140 pixels andthe range is (50 300) These values should be appropriatelyenlarged during aircraft detection to ensure that all aircraftare not filtered out on account of statistical errors Thegeometric features of the aircraft in the remote sensing imagehave rotation invariance The aspect ratio is within a fixedrange even at different resolutions thus it can be used as aconstraint to roughly distinguish between target and back-ground (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) represent the ranges ofaspect ratio width and height respectively

For the same image reducing the number of nontargetregion proposals has no effect on the target detection accu-racy (in deep learning this accuracy means the classificationscore) However it can improve the detection efficiencyBased on the GFC a region proposal filter method is pro-posed (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) are set as thresholds tofilter the region proposals The main algorithm is shown in(5) where Region Proposal = 1 means the region proposal ispreserved and Region Proposal = 0 means it is deleted Ratiorepresents the target aspect ratio width represents the targetwidth height denotes the target height and 120572 is the imageresolution coefficient which is used to adjust the threshold atdifferent resolutions

Region Proposal =

1 1199031 le ratio le 11990320 ratio lt 1199031 or ratio gt 1199032

6 Journal of Sensors

0 500 1000 1500 2000 250004

06

08

1

12

14

16

18

2

Samples

Asp

ect r

atio

Aspect ratioFitting curve of aspect ratio

(a)

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350

400

Samples

(Pix

els)

WidthFitting curve of width

HeightFitting curve of height

(b)

Figure 4 Geometric features all of the sample aircraft (a) Aspect ratio of each sample aircraft They are approximately 10 and the range is(06 16) The color in Figure 4(a) represents the value of the aspect ratio the blue is the smallest value the red is the largest value and themiddle is gradually transitions (b) Size of each sample aircraft the red is the width and the green is the height They are approximately 140pixels and the range is (50 300)

Region Proposal

=

1 120572 times 1199081 le width le 120572 times 11990820 width lt 120572 times 1199081 or width gt 120572 times 1199082

Region Proposal

=

1 120572 times ℎ1 le height le 120572 times ℎ20 height lt 120572 times ℎ1 or height gt 120572 times ℎ2

(5)

4 Materials

In this study we constructed three kinds of datasets sceneaircraft sample and test datasets All data were obtained fromGoogle Earth The scene dataset was from the 18th level andits spatial resolution was 12m The remaining two datasetswere from the 20th level and the spatial resolution was03m From a professional point of view compared with thereal remote sensing images the data obtained from GoogleEarth contained insufficient information However the deeplearning method is different from the traditional remotesensing image processing method in that it can obtain richinformation from this image typeWe advocate the use of realremote sensing images nonetheless one could attempt use ofGoogle Earth images

41 Training Sample Datasets Creating training samples isa key step in object detection in the field of deep learningThe sample quality directly affects the test results In the

CCNN structure the first-level network uses the CNN imageclassification model and the second-level network uses theFaster R-CNN object detection model The two networkshave different structures and functions therefore we createdtwo sample datasets one for scene classification and the otherfor aircraft detection

The first dataset was the remote sensing image scenedataset which was used to train the CNN classificationmodel We used the ldquoUC Merced Land Use Datasetrdquo asthe basic dataset [38] It contained 21 scene categories andincluded many typical background categories such as agri-culture airplanes beaches buildings bushes and parkinglots This dataset was suitable for scene classification how-ever there were only 100 images in each category To obtaina high-accuracy training model it was necessary to increasethe sample number of airports and other scenes that are easilyconfused with airportsThese scenes include industrial areasresidential areas harbors overpasses parking lots highwaysrunways and storage tanks Another 300 samples of airportimages and 700 samples of other scene categories were addedto the dataset The number of scene categories remained 21The sample number could be expanded to be eight times theoriginal by image transformation Finally the total numberof samples was 24800 Some added images are shown inFigure 5 where (a) depicts an airport scene and (b) depictsscenes that are easily confused with airports

The second dataset was an aircraft sample library ofremote sensing images It was used to train the Fast R-CNNnetwork model It contained 2511 original aircraft imagessome of which are shown in Figure 6 The sample containednot only all aircraft types but also as much background aspossible The diversity of the samples could help the network

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 2: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

2 Journal of Sensors

processing a large-area image many invalid region proposalsmust be detected and this is inefficient Some target detectionmethods using remote sensing images based on deep learninghave been presented Weakly supervised and semisupervisedfeature learning methods have been proposed [17ndash19] Zhanget al proposed an aircraft detection method based on weaklysupervised learning coupled with a CNN [20] Han etal proposed an iterative detector approach [21] by whichthe detector is iteratively trained using refined annotationsuntil the model converges Nevertheless a nonconvergencesituation may occur which reduces the detection accuracyWeakly supervised learning extracts features with a smallamount of sample data However owing to a lack of samplesthe features are not sufficient Therefore the detection accu-racy is limited

There are three key problems in aircraft detection fromremote sensing images by the deep learning method Firstthe number of training samples is limited A means of usinga small number of samples to train a high-accuracy model isan important point Second aircraft in remote sensing imageshave obvious features and some stability features can beselected as constraint conditions Combining these featureswith deep learning the detection method can have a bettertargeting ability Third a remote sensing image covers a largearea its scale is not uniform and the sensor is miscellaneousThe existing deep learning model and network structure arenot directly suitable for remote sensing image aircraft detec-tion [22] The network structure and detection algorithmmust be modified to improve the efficiency and accuracy

In view of the above problems a cascade CNN (CCNN)architecture is proposed to improve the accuracy and effi-ciency It contains four main parts (1) A two-level CCNNis designed to rapidly process remote sensing image aircraftdetection The first-level network quickly classifies the sceneand eliminates the areas that do not contain aircraft Thesecond-level network identifies and locates aircraft in theareas not filtered out in the previous step (2) A transfer-learning method is used to fine-tune the parameters ofpretrained classification and object detection models withthe samples (3) A region proposal filtering method of ageometric feature constraint (GFC) based on the geometricfeatures of aircraft is proposed By using these features to filterthe region proposals numerous nonaircraft region proposalsare eliminated (4) An aircraft sample library and a remotesensing image-scene sample library are established They areused as the data source of transfer-learning

2 Related Work

In this section we review related work that has utilized deeplearning for scene classification and target detection Remotesensing scene classification plays a key role in a wide rangeof applications and has received remarkable attention fromresearchers Significant efforts have been made to developvaried datasets and present a variety of approaches to sceneclassification from remote sensing images However therehas yet to be a systematic review of the literature concerningdatasets and scene classificationmethods In addition almost

all existing datasets have several limitations including thesmall scale of scene classes and image numbers the lack ofimage variation and diversity and accuracy saturation Inresponse to this problem Cheng et al [23] proposed a large-scale dataset called ldquoNWPU-RESISC45rdquo which is a publiclyavailable benchmark for remote sensing image scene classifi-cation (RESISC) This dataset contains 31500 images cover-ing 45 scene classes with 700 images in each class which pro-vides strong support for RESISC Paisitkriangkrai et al [24]designed a multiresolution convolutional neural network(CNN) model for the semantic labeling of aerial imageryTo avoid overfitting from training a CNN with limited data[25] investigated the use of a CNNmodel pretrained on gen-eral computer vision datasets for classifying remote sensingscenesHowever theCNNmodel in [25] was not further fine-tuned and was directly used for land-use classification Yaoet al [26] proposed a unified annotation framework com-bining discriminative high-level feature learning and weaklysupervised feature transfer Specifically they first employ anefficient stacked discriminative sparse autoencoder (SDSAE)to learn high-level features on an auxiliary satellite image dataset for a land-use classification task

Another way to improve the efficiency of aircraft detec-tion is to first detect the airport range and then detect theaircraft within the airport Airport detection from remotesensing images has gained increasing research attention[21 27 28] in recent years due to its strategic importance forboth civil and military applications however it is a challeng-ing problem because of variations in airport appearance andthe presence of complex cluttered backgrounds and varia-tions in satellite image resolution Yao et al [29] proposed ahierarchical detectionmodel with a coarse and a fine layer Atthe coarse layer a target-oriented saliency model is built bycombining contrast and line density cues to rapidly localizeairport candidate areas At the fine layer a learned conditionrandom field (CRF) model is applied to each candidate areato perform fine detection of the airport target CNNs havedemonstrated their strong feature representation power forcomputer vision Despite the progress made in natural sceneimages it is problematic to directly use CNN features forobject detection in optical remote sensing images becauseit is difficult to effectively manage the problem of variationsin object rotation variation To address this problem Chenget al [30] proposed a novel and effective approach to learninga rotation-invariant CNN (RICNN) model that advancesobject detection performance which is achieved by intro-ducing and learning a new rotation-invariant layer based onexisting CNN architectures Cosaliency detection the goalof which is to discover the common and salient objects in agiven image group has received tremendous research interestin recent years However most existing cosaliency detectionmethods assume that all images in an image group shouldcontain cosalient objects in only one category and are thusimpractical for large-scale image sets obtained from theInternet Yao et al [31] improved cosaliency detection andadvanced its developmentTheir results can outperform state-of-the-art cosaliency detection methods performed on man-ually separated image subgroups

Journal of Sensors 3

Delete

CNN

RPN GFC

Slidingwindow

Nextdetect

Faster R-CNN

VGG16

Subsampling Classification

RemoveduplicateRegion

proposalsVGG16

Figure 1 Architecture of two-level CCNN for large-area remote sensing rapid aircraft detection The target detection process consists of twoparts First the image is downsampledThe first-level CNN structure is used to classify the scene with a sliding window on the downsampledimage block-by-block The nontarget window image is excluded and the index position of the target window is reserved and passed to thenext level of the network The second level receives the target window and performs aircraft target detection on the original image An RPNis used to get region proposals and the GFC model is used to filter multiple region proposals and exclude the region proposals that do notsatisfy the geometric characteristics of aircraftThen the Faster R-CNN classificationmodel is used to classify the remaining region proposalsto generate the target area Finally using overlap constraints delete the redundant target area to get the final detection results

3 Methods

31 Cascade Network for Aircraft Detection In general theimage used for aircraft detection includes the entire airportand surrounding areas most image areas do not containaircraft This situation reduces the detection accuracy andefficiency We propose a two-level CCNN method thatcombines the trained models and GFC method to solve thisproblem As shown in Figure 1 the first level is a coarsefilter A new image with lower resolution is obtained throughtwofold downsampling of the original image The new imageis traversed using a sliding window and the CNN imageclassifier is used to classify each sliding window Most of thenegative areas are filtered out and the positive area is fed intothe second-level network for aircraft detection

The second-level network is a target detector In this levelFaster R-CNN is used to perform aircraft detection for therespective area The output of the first-level network is theinput of the second-level network The corresponding regionon the original image is sent to the second network and thewindow size is 800 times 800 pixels The region proposals aregenerated by the region proposal network (RPN) The maindifference between Faster R-CNN and Fast R-CNN is thegeneration of the region proposal that is the former uses theefficient RPNmethod whereas the latter uses the less efficientselective search (SS) method In the RPN each image pro-duces approximately 20000 region proposals of three ratiosand three scales Then according to the region proposal clas-sification score the top 300 region proposals are selected asthe Faster R-CNN input for target detection More than 95of the 20000 region proposals are background Scoring andsorting such a large number of region proposals is obviously

time-intensive The GFC is used to delete the invalid regionproposals The final detection for the remaining regionproposals is performed by the aircraft detector

The general sliding window method uses a small sizewindow such as 12 times 12 24 times 24 and 48 times 48 Small windowsand small steps require considerable time to traverse theentire image which reduces the detection efficiency In anexperiment we used a large-size sliding window In the firstlevel the window size was 400 times 400 which was suitable forimage classification According to the training samples thedownsampled images had a higher classification accuracymoreover the speed of traversing the entire image wasimproved In the second level the window size was 800times 800The range of this window was equal to the downsamplingrange It required almost the same amount of time to detecta 400 times 400 image and an 800 times 800 image Thus it was notnecessary to decompose the original image into four 400 times400 smaller images The window size and sliding step sizewere determined by the image resolution It had to satisfy thecondition that at least one window could contain a completemaximum aircraft

Because of the overlap between adjacent windows someaircraft may be detected several times The repetitive detec-tion areas should be optimized The redundant objects aredeleted and only one optimal result is retained The regionsfor the same aircraft must meet the following conditions (1)Every region should satisfy the condition of GFC that is theaspect ratio width and height must meet the thresholds of(1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) respectively (2) The overlapbetween two regions should be larger than 30 as definedin (1) (3)The area is the largest as defined in (2) in whicharea(119903119894) means the area of the given region The process is

4 Journal of Sensors

(a) (b)

Figure 2 Process of deleting duplicate detected regions and overlapping regions between sliding windows (a) Result before deleting the duplicate(b) Result after deleting the duplicate

Classification

Source tasks

Detection

Knowledge base

VGG16 model

Classification

Target tasks

Detection

Learning system

Sceneclassification

Aircraftdetection

Training withsmall sample

VGG16 model

Figure 3 Structure of transfer-learning for scene classification and target detection Source tasks are the original deep learning methods ofclassification and detection and the knowledge base is comprised of the trained networksThe learning system involves transfer-learning andfine-tuning and the target tasks are remote sensing image scene classification and aircraft detection

shown in Figure 2 in which the two aircraft at the upperleft are in overlapping windows The incomplete regions aredeleted and the most appropriate one is preserved

area (119903119894 cap 119903119895) ge 30 (1)

Region = max area (119903119894) (2)

32 Transfer-Learning and Fine-Tuning theModel ParametersTransfer-learning refers to a model training method Theparameters of the pretrained network model are fine-tunedby a small number of samples to prompt the originalmodel toadapt to a new task [32 33] The deep learning object detec-tion method can achieve very high detection accuracy Onereason is that a large number of samples are used to train thedetectionmodel which contains a sufficient number of targetfeatures However for the remote sensing image no largesample library such as ImageNet exists that can be used totrain a high-accuracy model Moreover training a new high-accuracy object detectionmodel consumes considerable timeand computing resources Transfer-learning can solve theproblem of lacking samples and significantly shorten thetraining time [34]

In this paper the transfer-learning method is respec-tively used to fine-tune the pretrained image classificationmodel and object detection model As shown in Figure 3the source models are the respective classification and objectdetectionmodels Both include theVGG16 network structure[35] and are trained with ImageNet They were trained withthe samples given in Section 41 The parameters of the origi-nalmodel were fine-tuned by the learning system to completethe tasks of scene classification and aircraft detection

The CNN structure comprises many layers The bottomlayers are used to extract generic features such as structuretexture and color The top layers contain specific featuresassociated with the target detection task Using transfer-learning the top layer parameters are adjusted with a smallnumber of aircraft samples to change the target detectiontaskThrough a backpropagation algorithm the bottom-layerparameters are adjusted to extract aircraft features In generalwe use softmax as the supervisory loss function which isa classic cross-entropy classifier It is defined in (3) where119891(119909119894) is the activation of the previous layer node that pushedthrough a softmax function In (4) the target output 119910119894 is a1 minus 119870 vector and 119870 is the number of outputs In addition 119871is the number of layers and 120582 is a regularization term Thegoal is to minimize loss Finally stochastic gradient descent

Journal of Sensors 5

is used to fine-tune themodel and backpropagation is used tooptimize the function

119891 (119909119894) = softmax (119909119894) =119890119909119894sum119873119894=1 119890119909119894

(3)

loss = minus119898

sum119894=1

119910119894 log119891 (119909119894) + 120582119871

sum119897=1

sum (100381710038171003817100381711990811989710038171003817100381710038172) (4)

Another important parameter is the learning rate Theparameters of the pretrainedmodel are fine thus the networklayer learning rate should be set to a small numberThereforea small learning rate is used to ensure that both the features ofthe new sample can be learned and the excellent parametersof the pretrained model can be maintained Taking FasterR-CNN for example the parameters that must be set arethe following (1) By initializing the training network andselecting the pretrained model the ldquotrain_netrdquo is set toVGG16v2caffemodel (2) Set the number of categories Inthis paper all aircraft types are considered the same typethus there are only two categories aircraft and backgroundIn the data layer num_classes is set to two in the cls_scorelayer num_output is set to two and in the bbox_pred layernum_output is set to eight (3) In setting the learning ratethe base learning rate base_ly is set to 0001 The learningrate ly_mult of the cls_score layer and the bbox_pred layer isset to five and ten respectively The learning rates of otherconvolutional layers are not modified (4) In setting the num-ber of iterations of the backpropagation function max_itersis set to 40000

33 Geometric Feature Constraint In a remote sensing imagethe aircraft covers only a small area the remaining area is thebackground The region proposals are mostly backgroundnot aircraft Reducing the number of region proposals canmake object detection more targeted which helps improveefficiency [16] The special imaging method of the remotesensing image determines that only the top features of theaircraft can be viewed in the image [36] Thus the geometricfeatures of aircraft in the image have high stability [37]This feature can be exploited when designing a detectionmethod When the resolution of the image is fixed thegeometric parameters of each aircraft are constantThereforethe geometric features can be used as important conditionsfor aircraft detection

The geometric features mentioned in this paper mainlyrefer to external contour sizes of aircraft in remote sensingimages It is possible to use geometric signature constraintsfor detection because the following three conditions are metFirst conventional remote sensing imaging employs nadirimages that is the sensor gets an image from the sky that isperpendicular (or near-perpendicular) to the ground Thusthe top contour features of aircraft are seen in these remotesensing images these features (length width and aspectratio) are relatively stable constants Although the length andwidth are affected by image resolution they can be calculatedfrom the resolution of the image whereas the aspect ratiois not affected by resolution and is more fixed Second thelength width wingspan and body length ratio of different

types of aircraft are all within a certain rangeThese geometricfeatures give us an exact basis for calculating region propos-alsThird statistical methods are used to count the geometricfeatures of many aircraft samples Samples have been col-lected in theworldrsquosmajor countries from largemedium andsmall airports covering almost all types of aircraft includinglarge passenger aircraft fighters small private aircraft androtor helicopters These statistical results are used as areference for geometric feature constraints

The image resolution and the result of aircraft geometrystatistics are used to calculate the size of the sliding windowThe conditions that need to be met are as follows Theminimumoverlap between the windowsmust be greater thanthemaximum size of the aircraft to ensure that no aircraftwillbe missedThe number of aircraft in a sliding window is verysmall a window contains about 2000ndash3000 region proposalsbut the aircraft are probably in only a few The number oftarget region proposals is much smaller than the number ofnontarget region proposals and havingmany nontarget areasresults in increased overall time consumption With GFCsdeleting a candidate box that does not satisfy the conditioncan greatly improve the time efficiency of target detection

Taking the ground truth of the aircraft in the sample asan example their lengths and widths are within a fixed rangeThe original 2511 sample images were used to analyze theseparameters and the results are shown in Figure 4

Figure 4(a) shows that the aspect ratio of the circum-scribed rectangle is primarily around 10 and the range is(06 16) Figure 4(b) shows that the width and height of thecircumscribed rectangle are mainly around 140 pixels andthe range is (50 300) These values should be appropriatelyenlarged during aircraft detection to ensure that all aircraftare not filtered out on account of statistical errors Thegeometric features of the aircraft in the remote sensing imagehave rotation invariance The aspect ratio is within a fixedrange even at different resolutions thus it can be used as aconstraint to roughly distinguish between target and back-ground (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) represent the ranges ofaspect ratio width and height respectively

For the same image reducing the number of nontargetregion proposals has no effect on the target detection accu-racy (in deep learning this accuracy means the classificationscore) However it can improve the detection efficiencyBased on the GFC a region proposal filter method is pro-posed (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) are set as thresholds tofilter the region proposals The main algorithm is shown in(5) where Region Proposal = 1 means the region proposal ispreserved and Region Proposal = 0 means it is deleted Ratiorepresents the target aspect ratio width represents the targetwidth height denotes the target height and 120572 is the imageresolution coefficient which is used to adjust the threshold atdifferent resolutions

Region Proposal =

1 1199031 le ratio le 11990320 ratio lt 1199031 or ratio gt 1199032

6 Journal of Sensors

0 500 1000 1500 2000 250004

06

08

1

12

14

16

18

2

Samples

Asp

ect r

atio

Aspect ratioFitting curve of aspect ratio

(a)

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350

400

Samples

(Pix

els)

WidthFitting curve of width

HeightFitting curve of height

(b)

Figure 4 Geometric features all of the sample aircraft (a) Aspect ratio of each sample aircraft They are approximately 10 and the range is(06 16) The color in Figure 4(a) represents the value of the aspect ratio the blue is the smallest value the red is the largest value and themiddle is gradually transitions (b) Size of each sample aircraft the red is the width and the green is the height They are approximately 140pixels and the range is (50 300)

Region Proposal

=

1 120572 times 1199081 le width le 120572 times 11990820 width lt 120572 times 1199081 or width gt 120572 times 1199082

Region Proposal

=

1 120572 times ℎ1 le height le 120572 times ℎ20 height lt 120572 times ℎ1 or height gt 120572 times ℎ2

(5)

4 Materials

In this study we constructed three kinds of datasets sceneaircraft sample and test datasets All data were obtained fromGoogle Earth The scene dataset was from the 18th level andits spatial resolution was 12m The remaining two datasetswere from the 20th level and the spatial resolution was03m From a professional point of view compared with thereal remote sensing images the data obtained from GoogleEarth contained insufficient information However the deeplearning method is different from the traditional remotesensing image processing method in that it can obtain richinformation from this image typeWe advocate the use of realremote sensing images nonetheless one could attempt use ofGoogle Earth images

41 Training Sample Datasets Creating training samples isa key step in object detection in the field of deep learningThe sample quality directly affects the test results In the

CCNN structure the first-level network uses the CNN imageclassification model and the second-level network uses theFaster R-CNN object detection model The two networkshave different structures and functions therefore we createdtwo sample datasets one for scene classification and the otherfor aircraft detection

The first dataset was the remote sensing image scenedataset which was used to train the CNN classificationmodel We used the ldquoUC Merced Land Use Datasetrdquo asthe basic dataset [38] It contained 21 scene categories andincluded many typical background categories such as agri-culture airplanes beaches buildings bushes and parkinglots This dataset was suitable for scene classification how-ever there were only 100 images in each category To obtaina high-accuracy training model it was necessary to increasethe sample number of airports and other scenes that are easilyconfused with airportsThese scenes include industrial areasresidential areas harbors overpasses parking lots highwaysrunways and storage tanks Another 300 samples of airportimages and 700 samples of other scene categories were addedto the dataset The number of scene categories remained 21The sample number could be expanded to be eight times theoriginal by image transformation Finally the total numberof samples was 24800 Some added images are shown inFigure 5 where (a) depicts an airport scene and (b) depictsscenes that are easily confused with airports

The second dataset was an aircraft sample library ofremote sensing images It was used to train the Fast R-CNNnetwork model It contained 2511 original aircraft imagessome of which are shown in Figure 6 The sample containednot only all aircraft types but also as much background aspossible The diversity of the samples could help the network

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 3: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Journal of Sensors 3

Delete

CNN

RPN GFC

Slidingwindow

Nextdetect

Faster R-CNN

VGG16

Subsampling Classification

RemoveduplicateRegion

proposalsVGG16

Figure 1 Architecture of two-level CCNN for large-area remote sensing rapid aircraft detection The target detection process consists of twoparts First the image is downsampledThe first-level CNN structure is used to classify the scene with a sliding window on the downsampledimage block-by-block The nontarget window image is excluded and the index position of the target window is reserved and passed to thenext level of the network The second level receives the target window and performs aircraft target detection on the original image An RPNis used to get region proposals and the GFC model is used to filter multiple region proposals and exclude the region proposals that do notsatisfy the geometric characteristics of aircraftThen the Faster R-CNN classificationmodel is used to classify the remaining region proposalsto generate the target area Finally using overlap constraints delete the redundant target area to get the final detection results

3 Methods

31 Cascade Network for Aircraft Detection In general theimage used for aircraft detection includes the entire airportand surrounding areas most image areas do not containaircraft This situation reduces the detection accuracy andefficiency We propose a two-level CCNN method thatcombines the trained models and GFC method to solve thisproblem As shown in Figure 1 the first level is a coarsefilter A new image with lower resolution is obtained throughtwofold downsampling of the original image The new imageis traversed using a sliding window and the CNN imageclassifier is used to classify each sliding window Most of thenegative areas are filtered out and the positive area is fed intothe second-level network for aircraft detection

The second-level network is a target detector In this levelFaster R-CNN is used to perform aircraft detection for therespective area The output of the first-level network is theinput of the second-level network The corresponding regionon the original image is sent to the second network and thewindow size is 800 times 800 pixels The region proposals aregenerated by the region proposal network (RPN) The maindifference between Faster R-CNN and Fast R-CNN is thegeneration of the region proposal that is the former uses theefficient RPNmethod whereas the latter uses the less efficientselective search (SS) method In the RPN each image pro-duces approximately 20000 region proposals of three ratiosand three scales Then according to the region proposal clas-sification score the top 300 region proposals are selected asthe Faster R-CNN input for target detection More than 95of the 20000 region proposals are background Scoring andsorting such a large number of region proposals is obviously

time-intensive The GFC is used to delete the invalid regionproposals The final detection for the remaining regionproposals is performed by the aircraft detector

The general sliding window method uses a small sizewindow such as 12 times 12 24 times 24 and 48 times 48 Small windowsand small steps require considerable time to traverse theentire image which reduces the detection efficiency In anexperiment we used a large-size sliding window In the firstlevel the window size was 400 times 400 which was suitable forimage classification According to the training samples thedownsampled images had a higher classification accuracymoreover the speed of traversing the entire image wasimproved In the second level the window size was 800times 800The range of this window was equal to the downsamplingrange It required almost the same amount of time to detecta 400 times 400 image and an 800 times 800 image Thus it was notnecessary to decompose the original image into four 400 times400 smaller images The window size and sliding step sizewere determined by the image resolution It had to satisfy thecondition that at least one window could contain a completemaximum aircraft

Because of the overlap between adjacent windows someaircraft may be detected several times The repetitive detec-tion areas should be optimized The redundant objects aredeleted and only one optimal result is retained The regionsfor the same aircraft must meet the following conditions (1)Every region should satisfy the condition of GFC that is theaspect ratio width and height must meet the thresholds of(1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) respectively (2) The overlapbetween two regions should be larger than 30 as definedin (1) (3)The area is the largest as defined in (2) in whicharea(119903119894) means the area of the given region The process is

4 Journal of Sensors

(a) (b)

Figure 2 Process of deleting duplicate detected regions and overlapping regions between sliding windows (a) Result before deleting the duplicate(b) Result after deleting the duplicate

Classification

Source tasks

Detection

Knowledge base

VGG16 model

Classification

Target tasks

Detection

Learning system

Sceneclassification

Aircraftdetection

Training withsmall sample

VGG16 model

Figure 3 Structure of transfer-learning for scene classification and target detection Source tasks are the original deep learning methods ofclassification and detection and the knowledge base is comprised of the trained networksThe learning system involves transfer-learning andfine-tuning and the target tasks are remote sensing image scene classification and aircraft detection

shown in Figure 2 in which the two aircraft at the upperleft are in overlapping windows The incomplete regions aredeleted and the most appropriate one is preserved

area (119903119894 cap 119903119895) ge 30 (1)

Region = max area (119903119894) (2)

32 Transfer-Learning and Fine-Tuning theModel ParametersTransfer-learning refers to a model training method Theparameters of the pretrained network model are fine-tunedby a small number of samples to prompt the originalmodel toadapt to a new task [32 33] The deep learning object detec-tion method can achieve very high detection accuracy Onereason is that a large number of samples are used to train thedetectionmodel which contains a sufficient number of targetfeatures However for the remote sensing image no largesample library such as ImageNet exists that can be used totrain a high-accuracy model Moreover training a new high-accuracy object detectionmodel consumes considerable timeand computing resources Transfer-learning can solve theproblem of lacking samples and significantly shorten thetraining time [34]

In this paper the transfer-learning method is respec-tively used to fine-tune the pretrained image classificationmodel and object detection model As shown in Figure 3the source models are the respective classification and objectdetectionmodels Both include theVGG16 network structure[35] and are trained with ImageNet They were trained withthe samples given in Section 41 The parameters of the origi-nalmodel were fine-tuned by the learning system to completethe tasks of scene classification and aircraft detection

The CNN structure comprises many layers The bottomlayers are used to extract generic features such as structuretexture and color The top layers contain specific featuresassociated with the target detection task Using transfer-learning the top layer parameters are adjusted with a smallnumber of aircraft samples to change the target detectiontaskThrough a backpropagation algorithm the bottom-layerparameters are adjusted to extract aircraft features In generalwe use softmax as the supervisory loss function which isa classic cross-entropy classifier It is defined in (3) where119891(119909119894) is the activation of the previous layer node that pushedthrough a softmax function In (4) the target output 119910119894 is a1 minus 119870 vector and 119870 is the number of outputs In addition 119871is the number of layers and 120582 is a regularization term Thegoal is to minimize loss Finally stochastic gradient descent

Journal of Sensors 5

is used to fine-tune themodel and backpropagation is used tooptimize the function

119891 (119909119894) = softmax (119909119894) =119890119909119894sum119873119894=1 119890119909119894

(3)

loss = minus119898

sum119894=1

119910119894 log119891 (119909119894) + 120582119871

sum119897=1

sum (100381710038171003817100381711990811989710038171003817100381710038172) (4)

Another important parameter is the learning rate Theparameters of the pretrainedmodel are fine thus the networklayer learning rate should be set to a small numberThereforea small learning rate is used to ensure that both the features ofthe new sample can be learned and the excellent parametersof the pretrained model can be maintained Taking FasterR-CNN for example the parameters that must be set arethe following (1) By initializing the training network andselecting the pretrained model the ldquotrain_netrdquo is set toVGG16v2caffemodel (2) Set the number of categories Inthis paper all aircraft types are considered the same typethus there are only two categories aircraft and backgroundIn the data layer num_classes is set to two in the cls_scorelayer num_output is set to two and in the bbox_pred layernum_output is set to eight (3) In setting the learning ratethe base learning rate base_ly is set to 0001 The learningrate ly_mult of the cls_score layer and the bbox_pred layer isset to five and ten respectively The learning rates of otherconvolutional layers are not modified (4) In setting the num-ber of iterations of the backpropagation function max_itersis set to 40000

33 Geometric Feature Constraint In a remote sensing imagethe aircraft covers only a small area the remaining area is thebackground The region proposals are mostly backgroundnot aircraft Reducing the number of region proposals canmake object detection more targeted which helps improveefficiency [16] The special imaging method of the remotesensing image determines that only the top features of theaircraft can be viewed in the image [36] Thus the geometricfeatures of aircraft in the image have high stability [37]This feature can be exploited when designing a detectionmethod When the resolution of the image is fixed thegeometric parameters of each aircraft are constantThereforethe geometric features can be used as important conditionsfor aircraft detection

The geometric features mentioned in this paper mainlyrefer to external contour sizes of aircraft in remote sensingimages It is possible to use geometric signature constraintsfor detection because the following three conditions are metFirst conventional remote sensing imaging employs nadirimages that is the sensor gets an image from the sky that isperpendicular (or near-perpendicular) to the ground Thusthe top contour features of aircraft are seen in these remotesensing images these features (length width and aspectratio) are relatively stable constants Although the length andwidth are affected by image resolution they can be calculatedfrom the resolution of the image whereas the aspect ratiois not affected by resolution and is more fixed Second thelength width wingspan and body length ratio of different

types of aircraft are all within a certain rangeThese geometricfeatures give us an exact basis for calculating region propos-alsThird statistical methods are used to count the geometricfeatures of many aircraft samples Samples have been col-lected in theworldrsquosmajor countries from largemedium andsmall airports covering almost all types of aircraft includinglarge passenger aircraft fighters small private aircraft androtor helicopters These statistical results are used as areference for geometric feature constraints

The image resolution and the result of aircraft geometrystatistics are used to calculate the size of the sliding windowThe conditions that need to be met are as follows Theminimumoverlap between the windowsmust be greater thanthemaximum size of the aircraft to ensure that no aircraftwillbe missedThe number of aircraft in a sliding window is verysmall a window contains about 2000ndash3000 region proposalsbut the aircraft are probably in only a few The number oftarget region proposals is much smaller than the number ofnontarget region proposals and havingmany nontarget areasresults in increased overall time consumption With GFCsdeleting a candidate box that does not satisfy the conditioncan greatly improve the time efficiency of target detection

Taking the ground truth of the aircraft in the sample asan example their lengths and widths are within a fixed rangeThe original 2511 sample images were used to analyze theseparameters and the results are shown in Figure 4

Figure 4(a) shows that the aspect ratio of the circum-scribed rectangle is primarily around 10 and the range is(06 16) Figure 4(b) shows that the width and height of thecircumscribed rectangle are mainly around 140 pixels andthe range is (50 300) These values should be appropriatelyenlarged during aircraft detection to ensure that all aircraftare not filtered out on account of statistical errors Thegeometric features of the aircraft in the remote sensing imagehave rotation invariance The aspect ratio is within a fixedrange even at different resolutions thus it can be used as aconstraint to roughly distinguish between target and back-ground (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) represent the ranges ofaspect ratio width and height respectively

For the same image reducing the number of nontargetregion proposals has no effect on the target detection accu-racy (in deep learning this accuracy means the classificationscore) However it can improve the detection efficiencyBased on the GFC a region proposal filter method is pro-posed (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) are set as thresholds tofilter the region proposals The main algorithm is shown in(5) where Region Proposal = 1 means the region proposal ispreserved and Region Proposal = 0 means it is deleted Ratiorepresents the target aspect ratio width represents the targetwidth height denotes the target height and 120572 is the imageresolution coefficient which is used to adjust the threshold atdifferent resolutions

Region Proposal =

1 1199031 le ratio le 11990320 ratio lt 1199031 or ratio gt 1199032

6 Journal of Sensors

0 500 1000 1500 2000 250004

06

08

1

12

14

16

18

2

Samples

Asp

ect r

atio

Aspect ratioFitting curve of aspect ratio

(a)

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350

400

Samples

(Pix

els)

WidthFitting curve of width

HeightFitting curve of height

(b)

Figure 4 Geometric features all of the sample aircraft (a) Aspect ratio of each sample aircraft They are approximately 10 and the range is(06 16) The color in Figure 4(a) represents the value of the aspect ratio the blue is the smallest value the red is the largest value and themiddle is gradually transitions (b) Size of each sample aircraft the red is the width and the green is the height They are approximately 140pixels and the range is (50 300)

Region Proposal

=

1 120572 times 1199081 le width le 120572 times 11990820 width lt 120572 times 1199081 or width gt 120572 times 1199082

Region Proposal

=

1 120572 times ℎ1 le height le 120572 times ℎ20 height lt 120572 times ℎ1 or height gt 120572 times ℎ2

(5)

4 Materials

In this study we constructed three kinds of datasets sceneaircraft sample and test datasets All data were obtained fromGoogle Earth The scene dataset was from the 18th level andits spatial resolution was 12m The remaining two datasetswere from the 20th level and the spatial resolution was03m From a professional point of view compared with thereal remote sensing images the data obtained from GoogleEarth contained insufficient information However the deeplearning method is different from the traditional remotesensing image processing method in that it can obtain richinformation from this image typeWe advocate the use of realremote sensing images nonetheless one could attempt use ofGoogle Earth images

41 Training Sample Datasets Creating training samples isa key step in object detection in the field of deep learningThe sample quality directly affects the test results In the

CCNN structure the first-level network uses the CNN imageclassification model and the second-level network uses theFaster R-CNN object detection model The two networkshave different structures and functions therefore we createdtwo sample datasets one for scene classification and the otherfor aircraft detection

The first dataset was the remote sensing image scenedataset which was used to train the CNN classificationmodel We used the ldquoUC Merced Land Use Datasetrdquo asthe basic dataset [38] It contained 21 scene categories andincluded many typical background categories such as agri-culture airplanes beaches buildings bushes and parkinglots This dataset was suitable for scene classification how-ever there were only 100 images in each category To obtaina high-accuracy training model it was necessary to increasethe sample number of airports and other scenes that are easilyconfused with airportsThese scenes include industrial areasresidential areas harbors overpasses parking lots highwaysrunways and storage tanks Another 300 samples of airportimages and 700 samples of other scene categories were addedto the dataset The number of scene categories remained 21The sample number could be expanded to be eight times theoriginal by image transformation Finally the total numberof samples was 24800 Some added images are shown inFigure 5 where (a) depicts an airport scene and (b) depictsscenes that are easily confused with airports

The second dataset was an aircraft sample library ofremote sensing images It was used to train the Fast R-CNNnetwork model It contained 2511 original aircraft imagessome of which are shown in Figure 6 The sample containednot only all aircraft types but also as much background aspossible The diversity of the samples could help the network

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 4: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

4 Journal of Sensors

(a) (b)

Figure 2 Process of deleting duplicate detected regions and overlapping regions between sliding windows (a) Result before deleting the duplicate(b) Result after deleting the duplicate

Classification

Source tasks

Detection

Knowledge base

VGG16 model

Classification

Target tasks

Detection

Learning system

Sceneclassification

Aircraftdetection

Training withsmall sample

VGG16 model

Figure 3 Structure of transfer-learning for scene classification and target detection Source tasks are the original deep learning methods ofclassification and detection and the knowledge base is comprised of the trained networksThe learning system involves transfer-learning andfine-tuning and the target tasks are remote sensing image scene classification and aircraft detection

shown in Figure 2 in which the two aircraft at the upperleft are in overlapping windows The incomplete regions aredeleted and the most appropriate one is preserved

area (119903119894 cap 119903119895) ge 30 (1)

Region = max area (119903119894) (2)

32 Transfer-Learning and Fine-Tuning theModel ParametersTransfer-learning refers to a model training method Theparameters of the pretrained network model are fine-tunedby a small number of samples to prompt the originalmodel toadapt to a new task [32 33] The deep learning object detec-tion method can achieve very high detection accuracy Onereason is that a large number of samples are used to train thedetectionmodel which contains a sufficient number of targetfeatures However for the remote sensing image no largesample library such as ImageNet exists that can be used totrain a high-accuracy model Moreover training a new high-accuracy object detectionmodel consumes considerable timeand computing resources Transfer-learning can solve theproblem of lacking samples and significantly shorten thetraining time [34]

In this paper the transfer-learning method is respec-tively used to fine-tune the pretrained image classificationmodel and object detection model As shown in Figure 3the source models are the respective classification and objectdetectionmodels Both include theVGG16 network structure[35] and are trained with ImageNet They were trained withthe samples given in Section 41 The parameters of the origi-nalmodel were fine-tuned by the learning system to completethe tasks of scene classification and aircraft detection

The CNN structure comprises many layers The bottomlayers are used to extract generic features such as structuretexture and color The top layers contain specific featuresassociated with the target detection task Using transfer-learning the top layer parameters are adjusted with a smallnumber of aircraft samples to change the target detectiontaskThrough a backpropagation algorithm the bottom-layerparameters are adjusted to extract aircraft features In generalwe use softmax as the supervisory loss function which isa classic cross-entropy classifier It is defined in (3) where119891(119909119894) is the activation of the previous layer node that pushedthrough a softmax function In (4) the target output 119910119894 is a1 minus 119870 vector and 119870 is the number of outputs In addition 119871is the number of layers and 120582 is a regularization term Thegoal is to minimize loss Finally stochastic gradient descent

Journal of Sensors 5

is used to fine-tune themodel and backpropagation is used tooptimize the function

119891 (119909119894) = softmax (119909119894) =119890119909119894sum119873119894=1 119890119909119894

(3)

loss = minus119898

sum119894=1

119910119894 log119891 (119909119894) + 120582119871

sum119897=1

sum (100381710038171003817100381711990811989710038171003817100381710038172) (4)

Another important parameter is the learning rate Theparameters of the pretrainedmodel are fine thus the networklayer learning rate should be set to a small numberThereforea small learning rate is used to ensure that both the features ofthe new sample can be learned and the excellent parametersof the pretrained model can be maintained Taking FasterR-CNN for example the parameters that must be set arethe following (1) By initializing the training network andselecting the pretrained model the ldquotrain_netrdquo is set toVGG16v2caffemodel (2) Set the number of categories Inthis paper all aircraft types are considered the same typethus there are only two categories aircraft and backgroundIn the data layer num_classes is set to two in the cls_scorelayer num_output is set to two and in the bbox_pred layernum_output is set to eight (3) In setting the learning ratethe base learning rate base_ly is set to 0001 The learningrate ly_mult of the cls_score layer and the bbox_pred layer isset to five and ten respectively The learning rates of otherconvolutional layers are not modified (4) In setting the num-ber of iterations of the backpropagation function max_itersis set to 40000

33 Geometric Feature Constraint In a remote sensing imagethe aircraft covers only a small area the remaining area is thebackground The region proposals are mostly backgroundnot aircraft Reducing the number of region proposals canmake object detection more targeted which helps improveefficiency [16] The special imaging method of the remotesensing image determines that only the top features of theaircraft can be viewed in the image [36] Thus the geometricfeatures of aircraft in the image have high stability [37]This feature can be exploited when designing a detectionmethod When the resolution of the image is fixed thegeometric parameters of each aircraft are constantThereforethe geometric features can be used as important conditionsfor aircraft detection

The geometric features mentioned in this paper mainlyrefer to external contour sizes of aircraft in remote sensingimages It is possible to use geometric signature constraintsfor detection because the following three conditions are metFirst conventional remote sensing imaging employs nadirimages that is the sensor gets an image from the sky that isperpendicular (or near-perpendicular) to the ground Thusthe top contour features of aircraft are seen in these remotesensing images these features (length width and aspectratio) are relatively stable constants Although the length andwidth are affected by image resolution they can be calculatedfrom the resolution of the image whereas the aspect ratiois not affected by resolution and is more fixed Second thelength width wingspan and body length ratio of different

types of aircraft are all within a certain rangeThese geometricfeatures give us an exact basis for calculating region propos-alsThird statistical methods are used to count the geometricfeatures of many aircraft samples Samples have been col-lected in theworldrsquosmajor countries from largemedium andsmall airports covering almost all types of aircraft includinglarge passenger aircraft fighters small private aircraft androtor helicopters These statistical results are used as areference for geometric feature constraints

The image resolution and the result of aircraft geometrystatistics are used to calculate the size of the sliding windowThe conditions that need to be met are as follows Theminimumoverlap between the windowsmust be greater thanthemaximum size of the aircraft to ensure that no aircraftwillbe missedThe number of aircraft in a sliding window is verysmall a window contains about 2000ndash3000 region proposalsbut the aircraft are probably in only a few The number oftarget region proposals is much smaller than the number ofnontarget region proposals and havingmany nontarget areasresults in increased overall time consumption With GFCsdeleting a candidate box that does not satisfy the conditioncan greatly improve the time efficiency of target detection

Taking the ground truth of the aircraft in the sample asan example their lengths and widths are within a fixed rangeThe original 2511 sample images were used to analyze theseparameters and the results are shown in Figure 4

Figure 4(a) shows that the aspect ratio of the circum-scribed rectangle is primarily around 10 and the range is(06 16) Figure 4(b) shows that the width and height of thecircumscribed rectangle are mainly around 140 pixels andthe range is (50 300) These values should be appropriatelyenlarged during aircraft detection to ensure that all aircraftare not filtered out on account of statistical errors Thegeometric features of the aircraft in the remote sensing imagehave rotation invariance The aspect ratio is within a fixedrange even at different resolutions thus it can be used as aconstraint to roughly distinguish between target and back-ground (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) represent the ranges ofaspect ratio width and height respectively

For the same image reducing the number of nontargetregion proposals has no effect on the target detection accu-racy (in deep learning this accuracy means the classificationscore) However it can improve the detection efficiencyBased on the GFC a region proposal filter method is pro-posed (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) are set as thresholds tofilter the region proposals The main algorithm is shown in(5) where Region Proposal = 1 means the region proposal ispreserved and Region Proposal = 0 means it is deleted Ratiorepresents the target aspect ratio width represents the targetwidth height denotes the target height and 120572 is the imageresolution coefficient which is used to adjust the threshold atdifferent resolutions

Region Proposal =

1 1199031 le ratio le 11990320 ratio lt 1199031 or ratio gt 1199032

6 Journal of Sensors

0 500 1000 1500 2000 250004

06

08

1

12

14

16

18

2

Samples

Asp

ect r

atio

Aspect ratioFitting curve of aspect ratio

(a)

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350

400

Samples

(Pix

els)

WidthFitting curve of width

HeightFitting curve of height

(b)

Figure 4 Geometric features all of the sample aircraft (a) Aspect ratio of each sample aircraft They are approximately 10 and the range is(06 16) The color in Figure 4(a) represents the value of the aspect ratio the blue is the smallest value the red is the largest value and themiddle is gradually transitions (b) Size of each sample aircraft the red is the width and the green is the height They are approximately 140pixels and the range is (50 300)

Region Proposal

=

1 120572 times 1199081 le width le 120572 times 11990820 width lt 120572 times 1199081 or width gt 120572 times 1199082

Region Proposal

=

1 120572 times ℎ1 le height le 120572 times ℎ20 height lt 120572 times ℎ1 or height gt 120572 times ℎ2

(5)

4 Materials

In this study we constructed three kinds of datasets sceneaircraft sample and test datasets All data were obtained fromGoogle Earth The scene dataset was from the 18th level andits spatial resolution was 12m The remaining two datasetswere from the 20th level and the spatial resolution was03m From a professional point of view compared with thereal remote sensing images the data obtained from GoogleEarth contained insufficient information However the deeplearning method is different from the traditional remotesensing image processing method in that it can obtain richinformation from this image typeWe advocate the use of realremote sensing images nonetheless one could attempt use ofGoogle Earth images

41 Training Sample Datasets Creating training samples isa key step in object detection in the field of deep learningThe sample quality directly affects the test results In the

CCNN structure the first-level network uses the CNN imageclassification model and the second-level network uses theFaster R-CNN object detection model The two networkshave different structures and functions therefore we createdtwo sample datasets one for scene classification and the otherfor aircraft detection

The first dataset was the remote sensing image scenedataset which was used to train the CNN classificationmodel We used the ldquoUC Merced Land Use Datasetrdquo asthe basic dataset [38] It contained 21 scene categories andincluded many typical background categories such as agri-culture airplanes beaches buildings bushes and parkinglots This dataset was suitable for scene classification how-ever there were only 100 images in each category To obtaina high-accuracy training model it was necessary to increasethe sample number of airports and other scenes that are easilyconfused with airportsThese scenes include industrial areasresidential areas harbors overpasses parking lots highwaysrunways and storage tanks Another 300 samples of airportimages and 700 samples of other scene categories were addedto the dataset The number of scene categories remained 21The sample number could be expanded to be eight times theoriginal by image transformation Finally the total numberof samples was 24800 Some added images are shown inFigure 5 where (a) depicts an airport scene and (b) depictsscenes that are easily confused with airports

The second dataset was an aircraft sample library ofremote sensing images It was used to train the Fast R-CNNnetwork model It contained 2511 original aircraft imagessome of which are shown in Figure 6 The sample containednot only all aircraft types but also as much background aspossible The diversity of the samples could help the network

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 5: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Journal of Sensors 5

is used to fine-tune themodel and backpropagation is used tooptimize the function

119891 (119909119894) = softmax (119909119894) =119890119909119894sum119873119894=1 119890119909119894

(3)

loss = minus119898

sum119894=1

119910119894 log119891 (119909119894) + 120582119871

sum119897=1

sum (100381710038171003817100381711990811989710038171003817100381710038172) (4)

Another important parameter is the learning rate Theparameters of the pretrainedmodel are fine thus the networklayer learning rate should be set to a small numberThereforea small learning rate is used to ensure that both the features ofthe new sample can be learned and the excellent parametersof the pretrained model can be maintained Taking FasterR-CNN for example the parameters that must be set arethe following (1) By initializing the training network andselecting the pretrained model the ldquotrain_netrdquo is set toVGG16v2caffemodel (2) Set the number of categories Inthis paper all aircraft types are considered the same typethus there are only two categories aircraft and backgroundIn the data layer num_classes is set to two in the cls_scorelayer num_output is set to two and in the bbox_pred layernum_output is set to eight (3) In setting the learning ratethe base learning rate base_ly is set to 0001 The learningrate ly_mult of the cls_score layer and the bbox_pred layer isset to five and ten respectively The learning rates of otherconvolutional layers are not modified (4) In setting the num-ber of iterations of the backpropagation function max_itersis set to 40000

33 Geometric Feature Constraint In a remote sensing imagethe aircraft covers only a small area the remaining area is thebackground The region proposals are mostly backgroundnot aircraft Reducing the number of region proposals canmake object detection more targeted which helps improveefficiency [16] The special imaging method of the remotesensing image determines that only the top features of theaircraft can be viewed in the image [36] Thus the geometricfeatures of aircraft in the image have high stability [37]This feature can be exploited when designing a detectionmethod When the resolution of the image is fixed thegeometric parameters of each aircraft are constantThereforethe geometric features can be used as important conditionsfor aircraft detection

The geometric features mentioned in this paper mainlyrefer to external contour sizes of aircraft in remote sensingimages It is possible to use geometric signature constraintsfor detection because the following three conditions are metFirst conventional remote sensing imaging employs nadirimages that is the sensor gets an image from the sky that isperpendicular (or near-perpendicular) to the ground Thusthe top contour features of aircraft are seen in these remotesensing images these features (length width and aspectratio) are relatively stable constants Although the length andwidth are affected by image resolution they can be calculatedfrom the resolution of the image whereas the aspect ratiois not affected by resolution and is more fixed Second thelength width wingspan and body length ratio of different

types of aircraft are all within a certain rangeThese geometricfeatures give us an exact basis for calculating region propos-alsThird statistical methods are used to count the geometricfeatures of many aircraft samples Samples have been col-lected in theworldrsquosmajor countries from largemedium andsmall airports covering almost all types of aircraft includinglarge passenger aircraft fighters small private aircraft androtor helicopters These statistical results are used as areference for geometric feature constraints

The image resolution and the result of aircraft geometrystatistics are used to calculate the size of the sliding windowThe conditions that need to be met are as follows Theminimumoverlap between the windowsmust be greater thanthemaximum size of the aircraft to ensure that no aircraftwillbe missedThe number of aircraft in a sliding window is verysmall a window contains about 2000ndash3000 region proposalsbut the aircraft are probably in only a few The number oftarget region proposals is much smaller than the number ofnontarget region proposals and havingmany nontarget areasresults in increased overall time consumption With GFCsdeleting a candidate box that does not satisfy the conditioncan greatly improve the time efficiency of target detection

Taking the ground truth of the aircraft in the sample asan example their lengths and widths are within a fixed rangeThe original 2511 sample images were used to analyze theseparameters and the results are shown in Figure 4

Figure 4(a) shows that the aspect ratio of the circum-scribed rectangle is primarily around 10 and the range is(06 16) Figure 4(b) shows that the width and height of thecircumscribed rectangle are mainly around 140 pixels andthe range is (50 300) These values should be appropriatelyenlarged during aircraft detection to ensure that all aircraftare not filtered out on account of statistical errors Thegeometric features of the aircraft in the remote sensing imagehave rotation invariance The aspect ratio is within a fixedrange even at different resolutions thus it can be used as aconstraint to roughly distinguish between target and back-ground (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) represent the ranges ofaspect ratio width and height respectively

For the same image reducing the number of nontargetregion proposals has no effect on the target detection accu-racy (in deep learning this accuracy means the classificationscore) However it can improve the detection efficiencyBased on the GFC a region proposal filter method is pro-posed (1199031 1199032) (1199081 1199082) and (ℎ1 ℎ2) are set as thresholds tofilter the region proposals The main algorithm is shown in(5) where Region Proposal = 1 means the region proposal ispreserved and Region Proposal = 0 means it is deleted Ratiorepresents the target aspect ratio width represents the targetwidth height denotes the target height and 120572 is the imageresolution coefficient which is used to adjust the threshold atdifferent resolutions

Region Proposal =

1 1199031 le ratio le 11990320 ratio lt 1199031 or ratio gt 1199032

6 Journal of Sensors

0 500 1000 1500 2000 250004

06

08

1

12

14

16

18

2

Samples

Asp

ect r

atio

Aspect ratioFitting curve of aspect ratio

(a)

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350

400

Samples

(Pix

els)

WidthFitting curve of width

HeightFitting curve of height

(b)

Figure 4 Geometric features all of the sample aircraft (a) Aspect ratio of each sample aircraft They are approximately 10 and the range is(06 16) The color in Figure 4(a) represents the value of the aspect ratio the blue is the smallest value the red is the largest value and themiddle is gradually transitions (b) Size of each sample aircraft the red is the width and the green is the height They are approximately 140pixels and the range is (50 300)

Region Proposal

=

1 120572 times 1199081 le width le 120572 times 11990820 width lt 120572 times 1199081 or width gt 120572 times 1199082

Region Proposal

=

1 120572 times ℎ1 le height le 120572 times ℎ20 height lt 120572 times ℎ1 or height gt 120572 times ℎ2

(5)

4 Materials

In this study we constructed three kinds of datasets sceneaircraft sample and test datasets All data were obtained fromGoogle Earth The scene dataset was from the 18th level andits spatial resolution was 12m The remaining two datasetswere from the 20th level and the spatial resolution was03m From a professional point of view compared with thereal remote sensing images the data obtained from GoogleEarth contained insufficient information However the deeplearning method is different from the traditional remotesensing image processing method in that it can obtain richinformation from this image typeWe advocate the use of realremote sensing images nonetheless one could attempt use ofGoogle Earth images

41 Training Sample Datasets Creating training samples isa key step in object detection in the field of deep learningThe sample quality directly affects the test results In the

CCNN structure the first-level network uses the CNN imageclassification model and the second-level network uses theFaster R-CNN object detection model The two networkshave different structures and functions therefore we createdtwo sample datasets one for scene classification and the otherfor aircraft detection

The first dataset was the remote sensing image scenedataset which was used to train the CNN classificationmodel We used the ldquoUC Merced Land Use Datasetrdquo asthe basic dataset [38] It contained 21 scene categories andincluded many typical background categories such as agri-culture airplanes beaches buildings bushes and parkinglots This dataset was suitable for scene classification how-ever there were only 100 images in each category To obtaina high-accuracy training model it was necessary to increasethe sample number of airports and other scenes that are easilyconfused with airportsThese scenes include industrial areasresidential areas harbors overpasses parking lots highwaysrunways and storage tanks Another 300 samples of airportimages and 700 samples of other scene categories were addedto the dataset The number of scene categories remained 21The sample number could be expanded to be eight times theoriginal by image transformation Finally the total numberof samples was 24800 Some added images are shown inFigure 5 where (a) depicts an airport scene and (b) depictsscenes that are easily confused with airports

The second dataset was an aircraft sample library ofremote sensing images It was used to train the Fast R-CNNnetwork model It contained 2511 original aircraft imagessome of which are shown in Figure 6 The sample containednot only all aircraft types but also as much background aspossible The diversity of the samples could help the network

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 6: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

6 Journal of Sensors

0 500 1000 1500 2000 250004

06

08

1

12

14

16

18

2

Samples

Asp

ect r

atio

Aspect ratioFitting curve of aspect ratio

(a)

0 500 1000 1500 2000 25000

50

100

150

200

250

300

350

400

Samples

(Pix

els)

WidthFitting curve of width

HeightFitting curve of height

(b)

Figure 4 Geometric features all of the sample aircraft (a) Aspect ratio of each sample aircraft They are approximately 10 and the range is(06 16) The color in Figure 4(a) represents the value of the aspect ratio the blue is the smallest value the red is the largest value and themiddle is gradually transitions (b) Size of each sample aircraft the red is the width and the green is the height They are approximately 140pixels and the range is (50 300)

Region Proposal

=

1 120572 times 1199081 le width le 120572 times 11990820 width lt 120572 times 1199081 or width gt 120572 times 1199082

Region Proposal

=

1 120572 times ℎ1 le height le 120572 times ℎ20 height lt 120572 times ℎ1 or height gt 120572 times ℎ2

(5)

4 Materials

In this study we constructed three kinds of datasets sceneaircraft sample and test datasets All data were obtained fromGoogle Earth The scene dataset was from the 18th level andits spatial resolution was 12m The remaining two datasetswere from the 20th level and the spatial resolution was03m From a professional point of view compared with thereal remote sensing images the data obtained from GoogleEarth contained insufficient information However the deeplearning method is different from the traditional remotesensing image processing method in that it can obtain richinformation from this image typeWe advocate the use of realremote sensing images nonetheless one could attempt use ofGoogle Earth images

41 Training Sample Datasets Creating training samples isa key step in object detection in the field of deep learningThe sample quality directly affects the test results In the

CCNN structure the first-level network uses the CNN imageclassification model and the second-level network uses theFaster R-CNN object detection model The two networkshave different structures and functions therefore we createdtwo sample datasets one for scene classification and the otherfor aircraft detection

The first dataset was the remote sensing image scenedataset which was used to train the CNN classificationmodel We used the ldquoUC Merced Land Use Datasetrdquo asthe basic dataset [38] It contained 21 scene categories andincluded many typical background categories such as agri-culture airplanes beaches buildings bushes and parkinglots This dataset was suitable for scene classification how-ever there were only 100 images in each category To obtaina high-accuracy training model it was necessary to increasethe sample number of airports and other scenes that are easilyconfused with airportsThese scenes include industrial areasresidential areas harbors overpasses parking lots highwaysrunways and storage tanks Another 300 samples of airportimages and 700 samples of other scene categories were addedto the dataset The number of scene categories remained 21The sample number could be expanded to be eight times theoriginal by image transformation Finally the total numberof samples was 24800 Some added images are shown inFigure 5 where (a) depicts an airport scene and (b) depictsscenes that are easily confused with airports

The second dataset was an aircraft sample library ofremote sensing images It was used to train the Fast R-CNNnetwork model It contained 2511 original aircraft imagessome of which are shown in Figure 6 The sample containednot only all aircraft types but also as much background aspossible The diversity of the samples could help the network

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 7: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Journal of Sensors 7

(a) (b)

Figure 5 Sample images of different scenes (a) Airport scene including a parking apron runway and general terminal (b) Scenes easilyconfused with an airport including industrial areas residential areas harbors overpasses parking lots highways runways and storagetanks

(a) (b)

Figure 6 Training sample dataset of aircraft in remote sensing images (a) Original training sample images (b) Corresponding labeled groundtruth images

learn different features and better distinguish the target andbackground As shown in Figure 6(a) the sample coversalmost all types of aircraft in the existing remote sensingimage including an airliner small planemilitary aircraft andhelicopters

We obtained the image from the zenith point to the nadirpointThus the unique imaging perspective results of the air-craft in the remote sensing images showed a fixed geometricformWe could only view the top of the aircraftTherefore thedirection and displacement of the aircraftwere two importantfactors that affected the expression of the target To ensurethat the target had a higher recognition rate in differentdirections and displacements we created transformations onthe sample such as a horizontal flip vertical flip and rotationsof 90∘ 180∘ and 270∘ These transformations not only couldincrease the number of samples but also adjust the directionand structure of the target to increase the sample diversityand make the model more robust After transformation the

sample contained a total of 20088 imagesThese images weredivided into three groups with different numbers Plane_Scontained 2511 images Plane_M contained 10044 imagesand Plane_L contained 20088 images

Labeling the image is an important part of creating atraining sample dataset The label is the basis for distinguish-ing between the aircraft and background The conventionalsample producing method is to independently create a posi-tive sample and negative sample In this paper we produce asample in a different approach the difference is that thepositive and negative samples are included in the same imageThe advantage to this approach is that the target is betterdistinguished from the background which can improve thedetection accuracy For example the trestle that connectsthe airplane and terminal is prone to false detection If theairplane and trestle are contained in the same image the tar-get detection network can better distinguish their differencesAs shown in Figure 6(b) the aircraft position is manually

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 8: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

8 Journal of Sensors

Table 1 Detailed information of the three airports

Test image Image size (pixel) Aircraft number Covered area (km2)Berlin Tegel Airport 22016 times 9728 35 191Sydney International Airport 13568 times 22272 78 269Tokyo Haneda Airport 20480 times 17920 94 327

Table 2 Results of transfer-learning on different train networks and different sample sizes

Training network Datasets Time (min) Per Iteration (s) Loss_bbox Loss_cls

CaffeNetPlane_S 64 0061 0055 0087Plane_M 241 0061 0051 0076Plane_L 837 0062 0047 0025

VGG16Plane_S 320 0446 0032 0084Plane_M 497 0446 0029 0044Plane_L 976 0453 0019 0028

marked in the image An XML annotation file is generatedin which basic information about the image and the aircraftis automatically saved

42 Test Dataset Three large-area images were employed forthe test Each image contained a complete airport and thearea was 191 km2 269 km2 and 327 km2 respectively Thetest images had the characteristics of a large area a diversityof ground cover types and complex backgrounds In additionto the airport there were residential areas harbors industrialareas woodlands and grasslands in the image The types ofobjects in the image were very rich and the environment wascomplex Details of the three images are shown in Figure 7and outlined in Table 1

Aircraft and airport detection are two different researchtopics [39] The focus of this paper is aircraft detectionof high-resolution remote sensing images Our objective isnot to detect the airport rather it is to directly detect theaircraft The test images were three airports Neverthelessthe experimental data did not cover only the airport area Infact when the images were enlarged it was evident that theycontained harbors industrial zones and residential areas

5 Results and Discussion

In this paper the full implementation was based on the open-source deep learning libraryCaffe [40]The experimentswererun using an Nvidia Titan X 12GD5 graphics-processing unit(GPU) The network models used in the experiments wereVGG16 pretrained with ImageNet

51 Training Detection Model The two VGG16 pretrainedmodels were fine-tuned using the two sample datasets pre-sented in Section 41 The evaluation training results areshown in Figure 8 As shown in Figure 8(a) the imageclassification accuracy quickly improves during the first10000 iterations and reaches a steady value of 09 at 40000iterations Figure 8(b) depicts the total loss of the objectdetection model whereby it reaches 01 at 40000 itera-tions Figures 8(c) and 8(d) are the location and classification

losses respectively with both reaching a value of less than01 Excellent transfer-learning results provided a good exper-imental basis and were used for the next step

The transfer-learning results are directly related to theaccuracy of the next aircraftdetection processUsing transfer-learning a high-accuracy aircraft detection model could beobtained using a small number of training samples and byrequiring a brief training time Here we use CaffeNet andVGG16 as examples The number of iterations was 40000Four indicators were used to analyze the results total trainingtime (Time) a single iteration time (Per Iteration) positionaccuracy (Loss_bbox) and classification accuracy (Loss_cls)The transfer-learning results are shown in Table 2The resultsshow that each combination of transfer-learning has goodtraining results The model with more training samples hashigher accuracy than the model with fewer samples and thecomplex model VGG16 has higher accuracy than the simplemodel CaffeNet

Table 2 shows that the accuracy of target classification(Loss_cls) increases with the number of samples Plane2_Lincreases by approximately 60 compared to Plane2_S Theposition accuracy (Loss_bbox) has the same trend as theformer however it is not obvious This result proves thatthe method of producing samples proposed in Section 41 iseffective The image transformation method can increase thesample number and enhance the robustness of the samples Itcan additionally improve the ability of transfer-learning andenhance the robustness and accuracy of the detection modelHowever the target detection accuracy remains affected bythe number of samples When the number of samples islarge the target detection accuracy is relatively high and thetraining time is correspondingly increased

CaffeNet is a small network whereas VGG16 is a largenetwork and its structure is more complex than CaffeNetAs the complexity of the network increases the positionaccuracy also increases Large networks increase by approx-imately 40 over small networks In the same network thetraining time increases with the number of samples and thenetwork complexityThe results show that transfer-learning isan effective high-accuracy model training method and it can

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 9: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Journal of Sensors 9

Table 3 Large-area image aircraft detection results

Test image True aircraft Detected aircraft Missing detection False detectionBerlin Tegel Airport 35 35 0 2Sydney International Airport 78 76 2 5Tokyo Haneda Airport 94 91 3 4

Table 4 Detailed evaluation criteria and comparison for the four detection results

Test image Method FDR () MR () AC () ER ()

Berlin Tegel Airport CCNN 571 0 100 571LOGNet-C 769 323 9677 1092

Sydney International Airport CCNN 658 256 9744 914LOGNet-C 4854 1087 8913 5941

Tokyo Haneda Airport CCNN 439 319 9681 758LOGNet-C 2080 154 9846 2234

realize the task of training a high-accuracy target detectionmodel with fewer samples

52 Aircraft Detection The three large-area test images wereused to verify the aircraft detection accuracy The results areshown in Figure 9 Based on the global and local details inthe figure the overall detection result is good Most of theaircraft in the airport area were detected The detection-missed aircraftwere small aircraft there was nomissed detec-tion of large aircraft Moreover there was no falsely detectedaircraft in industrial areas residential areas woodlandsgrasslands or other areas beyond the airports However therewere individual falsely detected targets within the airportsprimarily airport trestles and airport terminals The quan-titative indicators are shown in Table 3 in which ldquoTruerdquoindicates the number of real aircraft in the original imageldquoDetectedrdquo indicates the number of aircraft detected ldquoMiss-ingrdquo is the number of detection-missed aircraft and ldquoFalserdquo isthe number of falsely detected aircraft

The two-level CCNN aircraft detection structure couldrapidly process large-area remote sensing images withhigh accuracy however the commission errors and errorsremained In a complex environment many objects havefeatures similar to those of aircraft especially long-shapedbuildings and facilities As listed in Table 3 the false detectionrate of each airport is higher than the missing detection rateThrough analysis we determined that the detection-missedaircraft were not actually lost In the detection procedure theclassification threshold was set to 06When the classificationscore of a target was less than 06 the target was not markedThen the missing detection appeared When the thresholdwas lowered to 05 the detection-missed targets weremarkedand the detectionmiss rate was zero Nevertheless there weremore falsely detected targets Airport terminals and severalbuildings had features similar to those of aircraft These wereconsidered aircraft and marked When the threshold wasraised the false detection rate decreased while the detectionmiss rate increased When the threshold was lowered thefalse detection rate increased while the detection miss ratedecreased According to the needs of the detection task it is

a better choice to set a suitable threshold and to balance thefalse detection rate and detection miss rate

The experimental results in Table 3 show that the pro-posed method of CCNN is feasible A high-accuracy aircraftdetection model can be obtained by using the transfer-learning method to fine-tune a pretrained model The GFCmethod greatly reduces the number of region proposalsto directly improve the detection efficiency The two-levelcascade network architecture can detect aircraft in large-area remote sensing images with high accuracy and highefficiency However some shortcomings remain Detailedanalysis of transfer-learning and detection accuracy is out-lined below

The aircraft detectionmethod performancewas evaluatedby four criteria falsely detected ratio (FDR) miss ratio (MR)accuracy (AC) and error ratio (ER)Theirmeanings are givenin (6) A remote sensing image aircraft detection methodnamed LOGNet-C based on weakly supervised learning isproposed in article [20] The results of the CCNN methodwere compared with those of the LOGNet-C method Thefour evaluation indicators and comparison results are shownin Table 4

FDR = Number of falsely detected aircraftNumber of detected aircraft

times 100

MR = Number of missing aircraftNumber of aircraft

times 100

AC = Number of detected aircraftNumber of aircraft

times 100

ER = FDR +MR

(6)

The comparison results show that the false detection rateof the CCNN method is reduced by an average of 64more than the method of LOGNet-C The missing detectionrate decreases by an average of 231 Overall the detectionaccuracy increases by an average of 366

The test results showed that small aircraft are moreprone to missed detection while long-shaped buildings andfacilities are more prone to false detection as shown in Fig-ure 10 This phenomenon is related to the choice of training

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 10: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

10 Journal of Sensors

0 1 2 305(km)

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

13∘15

㰀30

㰀㰀E 13∘16

㰀30

㰀㰀E 13∘17

㰀30

㰀㰀E 13∘18

㰀30

㰀㰀E

52∘33

㰀30

㰀㰀N 52∘33

㰀30

㰀㰀N

52∘33

㰀0㰀㰀N

N

(a)

0 1 2 305

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

151∘10

㰀0㰀㰀E 151

∘11

㰀0㰀㰀E

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

33∘56

㰀0㰀㰀S

33∘57

㰀0㰀㰀S

33∘58

㰀0㰀㰀S

(km)

N

(b)

0 1 2 305

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

35∘34

㰀0㰀㰀N

35∘33

㰀0㰀㰀N

35∘32

㰀0㰀㰀N

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

139∘46

㰀0㰀㰀E 139

∘47

㰀0㰀㰀E 139

∘48

㰀0㰀㰀E

N

(km)

(c)

Figure 7 Test remote sensing images for aircraft detection (a) Berlin Tegel Airport (b) Sydney International Airport (c) Tokyo HanedaAirport

samples The small aircraft in the sample are only a minor-ity In the transfer-learning stage the detection model didnot learn more features of small aircraft Long-shaped build-ings were detected as aircraft The first reason is that these

buildings had similar contour features as aircraftThe secondreason is that there were an inadequate number of similarbuildings in the negative sample Consequently the featurescontained in the negative sample were insufficient When

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 11: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Journal of Sensors 11

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Accu

racy

(a)

0 10000 20000 30000 40000

02

04

06

08

10

Iterations

Loss

(b)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_bbo

x

(c)

0 10000 20000 30000 40000

01

02

03

04

05

Iterations

Loss

_cls

(d)

Figure 8 Accuracy and loss changes of the two models during the fine-tuning process of image classification and object detection (a) Accuracyof the image classification model (b) Total loss of the object detection model which was the weighted sum of loss_cls and loss_bbox (c)Location loss of the object detection model (d) Classification loss of the object detection model

increasing the number of training samples of these typesand retraining the model the target detection accuracy wasincreased The analysis results show that when producingthe training sample dataset it should not only contain alltypes of targets but the number of each target type shouldbe balanced The negative samples should contain as muchbackground information as possible especially for objectsthat have a texture color geometry or other features similarto those of aircraft

All false detection occurred in the airport Building-intensive areas were prone to false detection however theydid not appear This was on account of the contribution ofthe first-level cascade network which filtered out nontargetareas and did not feed these areas into the aircraft detectionnetwork Missed detection primarily occurred in the airportinterior and no aircraft were filtered out by the first-levelnetwork The main cause of the missed detection was that

the classification score was not high and the network did notextract an adequate number of features

6 Conclusions

In the framework of deep learning a rapid large-area remotesensing image aircraft detection method of CCNN was pro-posed Using the transfer-learning method the parametersof the existing target detection model were fine-tuned witha small number of samples and a high-accuracy aircraftdetection model was obtained Based on the geometric fea-ture invariance of the aircraft in the remote sensing image aregion proposal processing algorithm GFC was proposed toimprove the efficiency Based on the abovemethods a cascadenetwork architecture was designed for large-area remotesensing images for aircraft detection This method not onlycan directly handle large-area remote sensing images for

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 12: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

12 Journal of Sensors

(a)

(b)

(c)

Figure 9 Aircraft detection results of the three test images including the global images and local enlarged images Detection results of (a) BerlinTegel Airport (b) Sydney International Airport (c) Tokyo Haneda Airport

(a) (b) (c) (d)

Figure 10 Commission and omission errors The purple rectangle is the correctly detected aircraft the green rectangle is the falsely detectedaircraft and aircraftwithout a rectangle aremissed detected aircraft (a) and (b) are falsely detected aircraft that is predominantly long-shapedbuildings and facilities (c) and (d) are detection-missed aircraft predominantly small aircraft

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 13: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

Journal of Sensors 13

aircraft detection but it also overcomes the difficulty oftraining a high-accuracy model with a small number ofsamples

The experimental results showed that the detection accu-racy increased by an average of 366The false detection ratedecreased by an average of 64 and the missed detectionrate decreased by an average of 231 However some ofthe algorithms warrant improvement The target detectionthreshold is experientially set and has some limitations Infuture work we intend to address the needs of the detectiontask tomake the system automatically set the threshold valueIn the present work we focused on aircraft detection If thetarget is extended to other typical targets such as ships tanksand vehicles then this method will have a larger applicationscope

The failures in target detection in this experiment mayhave occurred for the following reasons First the numberof small aircraft in our sample is insufficient and the samplenumber of various types of aircraft is not balanced whichhas resulted in inadequate learning of the features of smallaircraft during training Second the choice of target thresholdmust be appropriateWhen detection is nearly complete eachregion proposal is scored When this score is greater thanthe threshold the target is marked and displayed howeverwhen the score is less than the threshold the target is notmarked and is not displayed but is instead hidden as back-ground There is a close relationship between the thresholdvalue and detection accuracy especially false positives andnegatives When the threshold value is too small false posi-tives decreaseHowever when the threshold value is too largefalse negatives increase Currently we set our threshold valueempirically and plan to adopt an adaptive threshold methodin the future

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

The authors would like to thank Dr Changjian Qiao for hisexcellent technical supportWithout his instructive guidanceimpressive kindness and patience they could not have com-pleted this thesis They would also like to thank Dr HuijinYang and Yuan Zhao for critically reviewing the manuscript

References

[1] G Cheng and J Han ldquoA survey on object detection in opticalremote sensing imagesrdquo ISPRS Journal of Photogrammetry andRemote Sensing vol 117 pp 11ndash28 2016

[2] Y Li X Sun H Wang H Sun and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using acontour-based spatial modelrdquo IEEE Geoscience and RemoteSensing Letters vol 9 no 5 pp 886ndash890 2012

[3] X Bai H Zhang and J Zhou ldquoVHR object detection basedon structural feature extraction and query expansionrdquo IEEE

Transactions on Geoscience and Remote Sensing vol 52 no 10pp 6508ndash6520 2014

[4] L Gao B Yang Q Du and B Zhang ldquoAdjusted spectralmatched filter for target detection in hyperspectral imageryrdquoRemote Sensing vol 7 no 6 pp 6611ndash6634 2015

[5] E Fakiris G Papatheodorou M Geraga and G FerentinosldquoAn automatic target detection algorithm for swath Sonarbackscatter imagery using image texture and independentcomponent analysisrdquo Remote Sensing vol 8 no 5 article no373 2016

[6] W Zhang X Sun K Fu C Wang and H Wang ldquoObjectdetection in high-resolution remote sensing images using rota-tion invariant parts based modelrdquo IEEE Geoscience and RemoteSensing Letters vol 11 no 1 pp 74ndash78 2014

[7] H Sun X Sun H Wang Y Li and X Li ldquoAutomatic targetdetection in high-resolution remote sensing images using spa-tial sparse coding bag-of-words modelrdquo IEEE Geoscience andRemote Sensing Letters vol 9 no 1 pp 109ndash113 2012

[8] X Sun H Wang and K Fu ldquoAutomatic detection of geospa-tial objects using taxonomic semanticsrdquo IEEE Geoscience andRemote Sensing Letters vol 7 no 1 pp 23ndash27 2010

[9] R M Willett M F Duarte M A Davenport and R G Bara-niuk ldquoSparsity and structure in hyperspectral imaging sensingreconstruction and target detectionrdquo IEEE Signal ProcessingMagazine vol 31 no 1 pp 116ndash126 2014

[10] F Gao Q Xu and B Li ldquoAircraft detection from VHR imagesbased on circle-frequency filter and multilevel featuresrdquo TheScientific World Journal vol 2013 Article ID 917928 7 pages2013

[11] I Sevo andAAvramovic ldquoConvolutional neural network basedautomatic object detection on aerial imagesrdquo IEEE Geoscienceand Remote Sensing Letters vol 13 no 5 pp 740ndash744 2016

[12] L Zhang G-S Xia TWu L Lin and X C Tai ldquoDeep learningfor remote sensing image understandingrdquo Journal of Sensorsvol 2016 Article ID 7954154 2 pages 2016

[13] R Girshick J Donahue T Darrell and J Malik ldquoRegion-based convolutional networks for accurate object detectionand segmentationrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 38 no 1 pp 142ndash158 2016

[14] E Pasolli F Melgani N Alajlan and N Conci ldquoOpticalimage classification a ground-truth design frameworkrdquo IEEETransactions on Geoscience and Remote Sensing vol 51 no 6pp 3580ndash3597 2013

[15] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile December 2015

[16] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[17] W Liao A Pizurica P Scheunders W Philips and Y PildquoSemisupervised local discriminant analysis for feature extrac-tion in hyperspectral imagesrdquo IEEE Transactions on Geoscienceand Remote Sensing vol 51 no 1 pp 184ndash198 2013

[18] D Zhang J Han G Cheng Z Liu S Bu and L Guo ldquoWeaklysupervised learning for target detection in remote sensingimagesrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no4 pp 701ndash705 2014

[19] T Deselaers B Alexe and V Ferrari ldquoWeakly supervised local-ization and learning with generic knowledgerdquo InternationalJournal of Computer Vision vol 100 no 3 pp 275ndash293 2012

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 14: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

14 Journal of Sensors

[20] F Zhang B Du L Zhang and M Xu ldquoWeakly supervisedlearning based on coupled convolutional neural networks foraircraft detectionrdquo IEEE Transactions on Geoscience and RemoteSensing vol 54 no 9 pp 5553ndash5563 2016

[21] J Han D Zhang G Cheng L Guo and J Ren ldquoObjectdetection in optical remote sensing images based on weaklysupervised learning and high-level feature learningrdquo IEEETransactions on Geoscience and Remote Sensing vol 53 no 6pp 3325ndash3337 2015

[22] X Chen S Xiang C-L Liu and C-H Pan ldquoVehicle detectionin satellite images by hybrid deep convolutional neural net-worksrdquo IEEE Geoscience and Remote Sensing Letters vol 11 no10 pp 1797ndash1801 2014

[23] G Cheng J Han and X Lu ldquoRemote sensing image sceneclassification benchmark and state of the artrdquo Proceedings of theIEEE vol 99 pp 1ndash19 2017

[24] S Paisitkriangkrai J Sherrah P Janney and A Van-DenHengel ldquoEffective semantic pixel labelling with convolutionalnetworks and conditional random fieldsrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo15) pp 36ndash43 June 2015

[25] O A B Penatti K Nogueira and J A Dos Santos ldquoDo deepfeatures generalize from everyday objects to remote sensingand aerial scenes domainsrdquo in Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition Workshops(CVPRW rsquo15) pp 44ndash51 June 2015

[26] X Yao J Han G Cheng X Qian and L Guo ldquoSemantic anno-tation of high-resolution satellite images via weakly supervisedlearningrdquo IEEE Transactions on Geoscience and Remote Sensingvol 54 no 6 pp 3660ndash3671 2016

[27] C Tao Y Tan H Cai and J Tian ldquoAirport detection fromlarge IKONOS images using clustered sift keypoints and regioninformationrdquo IEEE Geoscience and Remote Sensing Letters vol8 no 1 pp 128ndash132 2011

[28] O Aytekin U Zongur and U Halici ldquoTexture-based airportrunway detectionrdquo IEEE Geoscience and Remote Sensing Lettersvol 10 no 3 pp 471ndash475 2013

[29] X Yao J Han L Guo S Bu and Z Liu ldquoA coarse-to-finemodelfor airport detection from remote sensing images using target-oriented visual saliency and CRFrdquoNeurocomputing vol 164 pp162ndash172 2015

[30] G Cheng P Zhou and J Han ldquoLearning rotation-invariantconvolutional neural networks for object detection in VHRoptical remote sensing imagesrdquo IEEE Transactions on Geo-science and Remote Sensing vol 54 no 12 pp 7405ndash7415 2016

[31] X Yao J Han D Zhang and F Nie ldquoRevisiting co-saliencydetection a novel approach based on two-stage multi-viewspectral rotation co-clusteringrdquo IEEE Transactions on ImageProcessing vol 26 no 7 pp 3196ndash3209 2017

[32] L Cao C Wang and J Li ldquoVehicle detection from highwaysatellite images via transfer learningrdquo Information Sciences vol366 pp 177ndash187 2016

[33] S Gupta S Rana B Saha D Phung and S Venkatesh ldquoA newtransfer learning frameworkwith application tomodel-agnosticmulti-task learningrdquo Knowledge and Information Systems vol49 no 3 pp 933ndash973 2016

[34] A Van Opbroek M A Ikram M W Vernooij and M DeBruijne ldquoTransfer learning improves supervised image segmen-tation across imaging protocolsrdquo IEEE Transactions on MedicalImaging vol 34 no 5 pp 1018ndash1030 2015

[35] K Simonyan and A Zisserman ldquoVery deep convolutionalnetworks for large-scale image recognitionrdquo Computer Science2014

[36] W Zhao Z Guo J Yue X Zhang and L Luo ldquoOn combin-ing multiscale deep learning features for the classification ofhyperspectral remote sensing imageryrdquo International Journal ofRemote Sensing vol 36 no 13 pp 3368ndash3379 2015

[37] Q Zou L Ni T Zhang and Q Wang ldquoDeep learning basedfeature selection for remote sensing scene classificationrdquo IEEEGeoscience and Remote Sensing Letters vol 12 no 11 pp 2321ndash2325 2015

[38] Y Yang and S Newsam ldquoBag-of-visual-words and spatialextensions for land-use classificationrdquo in Proceedings of the 18thSIGSPATIAL International Conference on Advances in Geo-graphic Information Systems (GIS rsquo10) pp 270ndash279 ACM SanJose Calif USA November 2010

[39] X Wang Q Lv B Wang and L Zhang ldquoAirport detectionin remote sensing images a method based on saliency maprdquoCognitive Neurodynamics vol 7 no 2 pp 143ndash154 2013

[40] Y Jia E Shelhamer J Donahue et al ldquoCaffe convolutionalarchitecture for fast feature embeddingrdquo in Proceedings of theACM International Conference on Multimedia pp 675ndash678ACM Orlando Fla USA November 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

Page 15: Cascade Convolutional Neural Network Based on Transfer ...downloads.hindawi.com/journals/js/2017/1796728.pdf · ResearchArticle Cascade Convolutional Neural Network Based on Transfer-Learning

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of