pretraining convolutional neural networks for image-based...

11
Research Article Pretraining Convolutional Neural Networks for Image-Based Vehicle Classification Yunfei Han , 1,2,3 Tonghai Jiang , 1,2,3 Yupeng Ma, 1,2,3 and Chunxiang Xu 1,2,3 1 e Xinjiang Technical Institute of Physics & Chemistry, Urumqi 830011, China 2 Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China 3 University of Chinese Academy of Sciences, Beijing 100049, China Correspondence should be addressed to Tonghai Jiang; [email protected] Received 18 May 2018; Accepted 13 September 2018; Published 2 October 2018 Guest Editor: Zhijun Fang Copyright © 2018 Yunfei Han et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Vehicle detection and classification are very important for analysis of vehicle behavior in intelligent transportation system, urban computing, etc. In this paper, an approach based on convolutional neural networks (CNNs) has been applied for vehicle classification. In order to achieve a more accurate classification, we removed the unrelated background as much as possible based on a trained object detection model. In addition, an unsupervised pretraining approach has been introduced to better initialize CNNs parameters to enhance the classification performance. rough the data enhancement on manual labeled images, we got 2000 labeled images in each category of motorcycle, transporter, passenger, and others, with 1400 samples for training and 600 samples for testing. en, we got 17395 unlabeled images for layer-wise unsupervised pretraining convolutional layers. A remarkable accuracy of 93.50% is obtained, demonstrating the high classification potential of our approach. 1. Introduction Vehicle is one of the greatest inventions in human history. e vehicle has become an indispensable part of modern people’s life. e use of a huge large number of vehicles can reflect the population’s mobility, intimacy, economic, and so on, and the analysis of vehicle behavior is very meaningful for urban development and government decision-making. In order to collect refueling vehicle information, such as license plate, picture, time, location, volume, type, and so on, we have deployed data collecting equipment in many of refueling stations in Xinjiang, which is mainly responsible for safety supervision and analysis of refueling behavior. Till now, many vehicle profile information such as vehicle color and vehicle type is entered into the system by hand; this is inefficient and not uniform. e accurate, various, and volume of data are the key to dig the value of refueling data. Hence, it has become a problem to be solved urgently that how to obtain the vehicle profile information through the vehicle picture automatically. In this paper, we focus on how to get the vehicle type from the picture. is problem is regarded as image classification, which means we should classify the images containing vehicles into the right type by image processing. Due to the environment in which images are taken is quite varied and complex and the impact of irrelevant background, the vehicles in images are very difficult to recognize. anks to the success of deep learning, we present a com- bination of approaches for vehicle detection and classification based on convolutional neural networks in this paper. To detect the vehicle in the image more efficiently, a successful object detection approach is used to detect the objects in an image, then the target vehicle waiting for entering refueling station is filtered out. Next, we designed a convolutional neural networks which contains 4 convolutional layers, 3 max pooling layers, and 2 full connected layers for vehicle classification. We trained our model on labeled vehicles images dataset. Comparing it with other five state-of-the- art approaches verified our approach achieves the highest accuracy than others. In order to pursue better classifica- tion performance, we taken advantage of unsupervised pre- training to better initialize classification model parameters under the circumstance of a shortage of labeled images. e unsupervised pretraining method was implemented based on deconvolution. Aſter pretraining, the convolutional layers Hindawi Advances in Multimedia Volume 2018, Article ID 3138278, 10 pages https://doi.org/10.1155/2018/3138278

Upload: others

Post on 29-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

Research ArticlePretraining Convolutional Neural Networks for Image-BasedVehicle Classification

Yunfei Han 123 Tonghai Jiang 123 YupengMa123 and Chunxiang Xu123

1The Xinjiang Technical Institute of Physics amp Chemistry Urumqi 830011 China2Xinjiang Laboratory of Minority Speech and Language Information Processing Urumqi 830011 China3University of Chinese Academy of Sciences Beijing 100049 China

Correspondence should be addressed to Tonghai Jiang jthmsxjbaccn

Received 18 May 2018 Accepted 13 September 2018 Published 2 October 2018

Guest Editor Zhijun Fang

Copyright copy 2018 Yunfei Han et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Vehicle detection and classification are very important for analysis of vehicle behavior in intelligent transportation systemurban computing etc In this paper an approach based on convolutional neural networks (CNNs) has been applied for vehicleclassification In order to achieve a more accurate classification we removed the unrelated background as much as possible basedon a trained object detection model In addition an unsupervised pretraining approach has been introduced to better initializeCNNs parameters to enhance the classification performance Through the data enhancement on manual labeled images we got2000 labeled images in each category of motorcycle transporter passenger and others with 1400 samples for training and 600samples for testingThenwe got 17395 unlabeled images for layer-wise unsupervised pretraining convolutional layers A remarkableaccuracy of 9350 is obtained demonstrating the high classification potential of our approach

1 Introduction

Vehicle is one of the greatest inventions in human historyThe vehicle has become an indispensable part of modernpeoplersquos life The use of a huge large number of vehicles canreflect the populationrsquos mobility intimacy economic and soon and the analysis of vehicle behavior is very meaningfulfor urban development and government decision-making Inorder to collect refueling vehicle information such as licenseplate picture time location volume type and so on wehave deployed data collecting equipment inmany of refuelingstations in Xinjiang which is mainly responsible for safetysupervision and analysis of refueling behavior Till nowmanyvehicle profile information such as vehicle color and vehicletype is entered into the system by hand this is inefficientand not uniform The accurate various and volume of dataare the key to dig the value of refueling data Hence it hasbecome a problem to be solved urgently that how to obtainthe vehicle profile information through the vehicle pictureautomatically In this paper we focus onhow to get the vehicletype from the picture This problem is regarded as imageclassification which means we should classify the images

containing vehicles into the right type by image processingDue to the environment in which images are taken is quitevaried and complex and the impact of irrelevant backgroundthe vehicles in images are very difficult to recognize

Thanks to the success of deep learning we present a com-bination of approaches for vehicle detection and classificationbased on convolutional neural networks in this paper Todetect the vehicle in the image more efficiently a successfulobject detection approach is used to detect the objects in animage then the target vehicle waiting for entering refuelingstation is filtered out Next we designed a convolutionalneural networks which contains 4 convolutional layers 3max pooling layers and 2 full connected layers for vehicleclassification We trained our model on labeled vehiclesimages dataset Comparing it with other five state-of-the-art approaches verified our approach achieves the highestaccuracy than others In order to pursue better classifica-tion performance we taken advantage of unsupervised pre-training to better initialize classification model parametersunder the circumstance of a shortage of labeled images Theunsupervised pretraining method was implemented basedon deconvolution After pretraining the convolutional layers

HindawiAdvances in MultimediaVolume 2018 Article ID 3138278 10 pageshttpsdoiorg10115520183138278

2 Advances in Multimedia

were initialized by the pretrained parameters and trainedthe model on our labeled images data set thus we got abetter classification performance than the previous withoutpretraining

This paper is organized as follows The related works areintroduced in Section 2 Vehicle detection and classificationbased on CNNs and a pretraining approach are described inSection 3 In Section 4 the vehicles data set is presented andwe evaluated the presented approaches on our data and theexperimental results and a performance evaluation are givenFinally Section 5 concludes the paper

2 Related Works

Existing methods use various types of signal for vehicledetection and classification including acoustic signal [2ndash5] radar signal [6 7] ultrasonic signal [8] infrared ther-mal signal [9] magnetic signal [10] 3D lidar signal [11]and imagevideo signal [12ndash16] Furthermore some of themethods can be combined with a variety of signals such asradarampvision signal [7] and audioampvision signal [17] Usuallythe detection and classification performance is excellent inthesemethods because of the precise signal data but there aremany hardware devices involved in these methods resultingin larger deployment cost and even higher failure rate

The evolution of image processing techniques togetherwith wide deployment of surveillance cameras facilitatesimage-based vehicle detection and classification Variousapproaches to image-based vehicle detection and classifica-tion have been proposed over the last few years Kazemi etal [13] used 3 different kinds of feature extractors Fouriertransform Wavelet transform and Curvelet transform torecognize and classify 5 models of vehicles k-nearest neigh-bor is used as classifier They compare the 3 proposedapproaches and find that the Curvelet transform can extractbetter features Chen et al [18] presented a system for vehicledetection tracking and classification from roadside closedcircuit television (CCTV) First a Kalman filter trackeda vehicle to enable classification by majority voting overseveral consecutive frames then they trained a support vectormachine (SVM) using a combination of a vehicle silhouetteand intensity-based pyramid histogram of oriented gradient(HOG) features extracted following background subtractionclassifying foreground blobs with majority voting Wen et al[19] used Haar-like feature pool on a 32lowast32 grayscale imagepatch to represent a vehiclersquos appearance and then proposed arapid incremental learning algorithmofAdaBoost to improvethe performance of AdaBoost Arrospide and Salgado [16]analyzed the individual performance of popular techniquesfor vehicle verification and found that classifiers based onGabor and HOG features achieve the best results and outper-form principal component analysis (PCA) and other classi-fiers based on features as symmetry and gradient Mishra andBanerjee [20] detected vehicle using background extractedHaar pyramidal histogram of oriented gradients shape andscale-invariant feature transform features designed a mul-tiple kernel classifier based on k-nearest neighbor to dividethe vehicles into 4 categories Tourani and Shahbahrami [21]

combined different imagevideo processing methods includ-ing object detection edge detection frame differentiationand Kalman filter to propose a method which resulted inabout 95 percent accuracy for classification and about 4percent error in vehicle detection targets In these methodsthe classification results are very good however there arestill some problems First the image features are limitedby the hand-crafted features algorithms to represent richinformation Second the hand-crafted features algorithmsrequire a lot of calculation so they are not suitable forreal-time applications especially for embedding in front-endcamera devices Third most of them are used in fixed scenesand background environments it is difficult for them to dealwith complex environment

More recently deep learning has become a hot topic inobject detection and object classification area Wang et al[22] proposed a novel deep learning based vehicle detec-tion algorithm with 2D deep belief network the 2D-DBNarchitecture uses second-order planes instead of first-ordervector as input and uses bilinear projection for retainingdiscriminative information so as to determine the size ofthe deep architecture which enhances the success rate ofvehicle detection Their algorithm preforms very good intheir datasets He et al [1] proposed a new efficient vehicledetection and classification approach based on convolutionalneural network the features extracted by this method outper-form those generated by traditional approaches Yi et al [23]proposed a deep convolution network based on pretrainedAlexNet model for deciding whether a certain image patchcontains a vehicle or not in Wide Area Motion Imagery(WAMI) imagery analysis Li et al [24] presented the 3Drange scan data in a 2D point map and used a single 2Dend-to-end fully convolutional network to predict the vehicleconfidence and the bounding boxes simultaneously and theygot the state-of-the-art performance on the KITTI dataset

Meantime object detection and classification based onConvolutional neural networks (CNNs) [25ndash27] are verysuccessful in the field of computer vision recently The firstwork about object detection and classification based ondeep learning has been done in 2013 Sermanet et al [28]present an integrated framework for using deep learningfor object detection localization and classification thisframework obtains very competitive results for the detectionand classifications tasks Up to now excellent object detectionand classification models based on deep learning include R-CNN [29] Fast R-CNN [30] YOLO [31] Faster R-CNN [32]SSD [33] and R-FCN [34] these models achieve state-of-the-art results on several data sets Before the YOLO manyapproaches on object detection for example R-CNN andFaster R-CNN repurpose classifiers to perform detectionInstead YOLO frame object detection is a regression problemto separated bounding boxes and associated class probabil-ities The YOLO framework uses a custom network basedon the Googlenet architecture using 852 billion operationsfor a forward pass However a more recent improved modelcalled YOLOv2 [35] achieves comparable results on standardtasks like PASCAL VOC and COCO In YOLOv2 network ituses a newmodel calledDarknet-19 and has 19 convolutionallayers and 5 maxpooling layers the model only requires 558

Advances in Multimedia 3

Classificationbased on themodel

Classificationresult

Vehicles detection based onYOLOv2

To labelimages

To pre-train model

To train thepre-trainedmodel

Image

Detection basedon YOLOv2

Images Dataset

Figure 1 Flowchart of vehicle classification

billion operations In conclusion YOLOv2 is a state-of-the-art detection system it is better faster and stronger thanothers and applied for object detection tasks in this work

At last unsupervised pretraining initializes the model toa point in the parameter space that somehow renders theoptimization process more effective in the sense of achievinga lower minimum of the empirical loss function [36] Muchrecent research has been devoted to learning algorithms fordeep architectures such as Deep Belief Networks [37 38] andstacks of autoencoder variants [39] After vehicle detectionwe can easily get a lot of unlabeled images of vehicles andoptimize the classification model parameters initialization byunsupervised pretraining

3 Methodology

In this section we will present the details of themethod basedon CNNs for image-based vehicle detection and vehicle clas-sification This section contains three parts vehicle detectionvehicle classification and pretraining approachThe relationsbetween each part and the overall framework of the entireidea is shown in Figure 1

31 Vehicle Detection Our images taken from static camerasin different refueling station contain front views of vehiclesor side views of vehicles at any point The vehicles in imagesare very indeterminacy this makes the vehicle detectionmore difficult in traditional methods based on hand-craftedfeatures

The YOLOv2 model is trained on COCO data sets it candetect 80 common objects in life such as person bicycle carbus train truck boat bird cat etc and therefore we canperform vehicle detection based on YOLOv2 In a picture ofthe vehicle which is waiting for entering the refueling stationshown in Figure 2 YOLOv2 can well detect many objects forexample the security guards drivers the vehicle and queuedvehicles and even vehicles on the side of road Here ourgoal is to pick up the vehicle which is waiting for enteringin picture

truck

truck

person

person

personpersperson

car car car

Figure 2 YOLOv2 detection Top truck Down car

Although trained YOLOv2 can detect the vehicle anddivide it into bicycle car motorbike bus and truck it doesnot meet our classification categories In order to solve thisproblem to fine-tune the YOLOv2 on our data could be asolution but this method needs amount of manual labeleddata and massive computation it is not a preferable methodfor us and then we presented a rule-based method todetect four categories vehicles more accurately First fromthe YOLOv2 detection results we select the objects whichare very similar to our targets such as car bus truck andmotorbike second according to the distance between thevehicle and the camera the closer the vehicle is the biggerthe target is we choose the most similar vehicle in pictureas the target vehicle entering the refueling station for furthervehicle behavior analysis

32 Vehicle Classification According to the function and sizeof vehicles vehicles will be divided into four categories ofmotorcycle transporter passenger and others Motorcycleincludes motorcycle and motor tricycle transporter includestruck and container car passenger includes sedan hatchbackcoupe van SUV and MPV others include vehicles used inagricultural production and infrastructure such as tractorand crane and other types of vehicles Figure 3 shows thesheared samples in four categories in each column As we cansee samples images are very different in shape color sizeand angle of camera even the samples images in the samecategory And Figure 3(b) bottom and Figure 3(c) bottom arenot in the same category but they are very similar especiallyon front face shape and color whichmakes the classificationbetween transporter and passenger more difficult

4 Advances in Multimedia

(a) (b) (c) (d)

Figure 3 Sheared vehicle image examples for four categories (a) Motorcycle (b) transporter (c) passenger (d) others

Table 1 The structure of C4M3F2

Layer TypeActivation SizeStride FiltersConvolutionalReLU 3times31 32Max Pooling 2times22ConvolutionalReLU 3times31 64ConvolutionalReLU 3times31 64Max Pooling 2times22ConvolutionalReLU 3times31 64Max Pooling 2times22Fully ConnectedReLU 1024Fully Connected 1024Softmax 4

To solve this difficult problem we presented a convo-lutional classification model which is effective and requireslittle amount of operations Our model called C4M3F2 has4 convolutional layers 3 max pooling layers and 2 fullyconnected layers

Each convolutional layer contains multiple (32 or 64)3lowast3 kernels and each kernel represents a filter connected tothe outputs of the previous layer Each max pooling layercontains multiple max pooling with 2times2 filters and stride2 it effectively reduces the feature dimension and avoidsoverfitting For fully connected layers each layer contains1024 neurons each neuron makes prediction from its allinput and it is connected to all the neurons in previous layersFor each sheared vehicle image detected from YOLOv2 ithas been resized to 48times48 and then passed into C4M3F2Eventually all the features are passed to softmax layer andwhat we need to do is just minimizing the cross entropy lossbetween softmax outputs and the input labels Table 1 showsthe structure of our model C4M3F2

33 Pretraining Approach With the purpose of achievinga satisfactory classification result we need more labeledimages for training our model but there is a shortage oflabeled images however there are plenty of images collectedeasily and how to use the plenty of unlabeled images foroptimization of our classification model has become themaincontent in this subsection

The motivation of this unsupervised pretraining ap-proach is to optimize the parameters in convolutional kernel

Input Code

Encoder

Reconstruction

Decoder

Error

Figure 4 Autoencoder

parameters Kernel training in C4M3F2 starts from a randominitialization and we hope that the kernel training procedurecan be optimized and accelerated using a good initial valueobtained by unsupervised pretraining In addition pretrain-ing initializes themodel to a point in the parameter space thatsomehow renders the optimization process more effective inthe sense of achieving a lower minimum of the loss function[36] Next we will explain how to greedy layer-wise pretrainthe convolution layers and the fully connected layers in theC4M3F2 model Max pooling layer function is subsamplingit is not included in layer-wise pretraining process

An autoencoder [40] neural network is an unsupervisedlearning algorithm that applies backpropagation setting thetarget values to be equal to the inputs It uses a set ofrecognition weights to convert the input into code and thenuses a set of generative weights to convert the code into anapproximate reconstruction of the input Autoencoder musttry to reconstruct the input which aims to minimize thereconstruction error as shown in Figure 4

According to the aim of autoencoder our approach forunsupervised pretrainingwill be explained In every convolu-tional layer convolution can be regarded as the encoder anddeconvolution [41] is taken as the decoder which is a veryunfortunate name and also called transposed convolutionAn input image is passed into the encoder then the outputcode getting from encoder is passed into the decoder toreconstruct the input image Here Euclidean distance whichmeans the reconstruct error is used to measure the similaritybetween the input image and reconstructed image so the aimof our approach is minimizing the Euclidean paradigm Forpretraining the next layer first we should drop the decoderand freeze the weights in the encoder of previous layers andthen take the output code of previous layer as the input in this

Advances in Multimedia 5

layer and do the same things as in previous layer Next howto use transposed convolution to construct and minimize theloss function in one convolutional layer will be described indetail as follows

The convolution of a feature maps and an image can bedefined as

119910119894119895 = 119908119894 oplus 119909119895 (1)

where oplus denotes the 2D convolution119910119894119895 isin 119877119903119909times119888119909 is theconvolution result and the padding is set to keep the input andoutput dimensions consistent 119908119894 isin 119877119903119908times119888119908 is the 119894th kerneland 119909119895 isin 119877119903119909times119888119909 denotes the 119895th training image

Then transform the convolution based on circulantmatrix in linear system A circulant matrix is a special kind ofToeplitz matrix where each row vector is rotated one elementto the right relative to the preceding row vector An 119899 times 119899circulant matrix C takes the form in

119862 =[[[[[[[[[

1198880 119888119899minus11198881 1198880 sdot sdot sdot 1198882 11988811198883 1198882 d119888119899minus2 119888119899minus3119888119899minus1 119888119899minus2 sdot sdot sdot 1198880 119888119899minus11198881 1198880

]]]]]]]]](2)

Let 119891119895 ℎ119894 isin 119877119903119890times119888119890 be the extension of 119909119895 119908119894 where 119903119890 =119903119909 + 119903119908 minus 1 119888119890 = 119888119909 + 119888119908 minus 1 And the method is as follows (3)and (4) where 119874 is zero matrix

119891119895 = [119909119895 119874119874 119874] (3)

ℎ119894 = [119908119894 119874119874 119874] (4)

Let 119891V119895 isin 119877119903119890119888119890times1 be 119891119895 in vectored form 119903119900119908119886 be a row

of ℎ119894 and 119903119900119908119886 = [1198990 1198991 1198992 sdot sdot sdot 119899119888119890minus1] To build circulant

matrices 11986701198671 1198672 sdot sdot sdot 119867119888119890minus1 by 119903119900119908119886 and by these circu-

lant matrices a block circulant matrix is defined as shown informula (5)

119867 =[[[[[[[[[

1198670 119867119888119890minus11198671 1198670 sdot sdot sdot 1198672 11986711198673 1198672 d

119867119888119890minus2 119867119888

119890minus3119867119888

119890minus1 119867119888

119890minus2

sdot sdot sdot 1198670 119867119888119890minus11198671 1198670

]]]]]]]]](5)

Here we can transform the convolution into (6)

119876 = 119867119891V119895 (6)

119876 is the vector form of result of convolution calculationand then to reshape 119876 into 1198761015840 isin 119877119903119890times119888119890 In this convolutionprocess the padding is dealing with filling 0 but in actualimplementation of our approach we keep the convolutionalinput and output dimensions consistent So we need to prune

the extra values to keep the input and output dimensionsconsistent So we intercept the matrix 1198761015840 then 119910119894119895 = 1198761015840[119903119908 minus1 119903119890 minus 119903119908 + 1 119888119908 minus 1 119888119890 minus 119888119908 + 1]

To simplify the calculation we extract the effective rowsof 119867 according to the effective row indexes which indicatesthe position of the elements of 119910119894119895 in 119876 and denote these rowsby119882119894 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times2119888119890

Now 119884119894119895 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times1 is vector form of 119910119894119895 sothe convolution can be rewritten as

119884119894119895 = 119882119894119891V119895 (7)

There are 119869 training vehicle images and 119870 kernels Let119883 = [119891V1 119891V2 119891V

119895 ] and 119882 = [11988211198822 119882119870]The convolution can be calculated as

119884 = W119883 (8)

And the deconvolution can be calculated

1198831015840 = 119882119879119884 (9)

So 1198831015840 is the X reconstruction Then loss function basedon Euclidean paradigm is defined in formula (10) being

loss (119882119883) = 10038171003817100381710038171003817119883 minus 1198831015840100381710038171003817100381710038172 = radic 119869sum119895=1

(119883119895 minus 1198831015840119895)2 (10)

Then we used adam optimizer which is an algorithm forfirst-order gradient-based optimization of stochastic objec-tive functions based on adaptive estimates of lower-ordermoments to solve the minimum optimization problem informula (10)

After the greedy layer-wise unsupervised pretraining weinitiate the parameters in every convolutional layer withthe pretrained values and run the supervised training forclassification according to themethod in previous subsection

4 Experiments and Discussions

We evaluated the presented algorithm on our data andcompared it with other four state-of-the-art methods

41 Datasets and Experiment Environment The vehiclesimages are taken by the static cameras in different refuelstations after being compressed they are sent to serversThe quality of images on servers is lower than that selectedby random and classified into four categories of motorcycletransporter passenger and others by hand We got 498motorcycle images 1109 transporter images 1238 passengerimages and 328 other images Due to the time-consumingand labor-intensive of manual labeling there is a shortage oflabeled images Image augmentation has been used to enrichthe data Keras the excellent high level neural network APIprovides the ImageDataGenerator for image data preparationand augmentation Shear range is set to -02 to 02 zoomrange is set to -02 to 02 rotation range is set to -7 to 7 sizeis set to 256times256 and the points outside the boundaries are

6 Advances in Multimedia

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

OriginalSheared

02

03

04

05

06

Accu

racy 07

08

Figure 5 Comparison of training process between original andsheared

filled according to the nearest mode After configuration andtaking into account the balance of data we fitted it on ourdata and got 1400 samples for every categories on the trainingset and 600 samples for every categories on the testing set toassess our classification model

For vehicle classification the CNNs are under Tensorflowframework the SIFT is under OpenCV (httpsopencvorg)and other feature embedding methods are under scikit-image (httpscikit-imageorg) All the experiments wereconducted on a regular notebook PC (25-GHz 8-core CPU12-G RAM and Ubuntu 64-bit OS)

42 Vehicle Detection Experiment with YOLOv2 For theexperiment using original images the original training setand test set are used for training and testing for the exper-iment using sheared images we used the approach based ontrained YOLOv2 to detect the original training set and testset to get the sheared training set and sheared testing set fortraining and testing

To verify the importance of vehicle detection for vehicleclassification we designed two groups of vehicle classifica-tion experiments one using original images and the otherone using sheared images after vehicle detection then theC4M3F2model is used for vehicle classification experiments

We initialized the C4M3F2 model by truncated nor-mal distribution fitted the model on original training setand sheared training set for 2000 epochs respectively andrecorded the accuracy of our C4M3F2 model on differenttesting set the results are shown in Figure 5 As we expectedthe sheared images of vehicles more accurately representthe characteristics of vehicles while pruning more uselessinformation and facilitating the feature extraction and vehicleclassification As we can see in Figure 5 the accuracy ofC4M3F2 model using sheared data set is much better thanthe one using original data set in the previous training thecharacteristics of vehicles were extracted more accurately sothat the model quickly achieved a better classification resultsand a stable status

Finally the accuracy of C4M3F2 using sheared data set is9142 which is 453 higher than the 8689 of C4M3F2using original data set It can be concluded that the resultsof vehicle classification using sheared data set after vehicledetection based on YOLOv2 can be improved effectively

Table 2 Accuracy and FPS of different methods

Method Accuracy FPSHOG+SVM 6012 4DAISY+SVM 6904 2ORB+BoW+SVM 6407 7SIFT+BoW+SVM 7449 5DeCAF[1] 6620 13CNNs 9142 800

43 Compare Our Approach with Others There are manyother image classification methods To assess our classifi-cation model we compared our approach with other fivemethods

The five methods are based on the image features definedby the scholars in computer image processing Consideringthe comprehensive factors four kinds of image features anda convolutional method are selected they are histograms oforiented gradient (HOG) [42] DAISY [43] oriented FASTand rotated BRIEF (ORB)[44] scale-invariant feature trans-form (SIFT) [45] and DeCAF[1] respectively These methodsare excellent in target object detection in [1 42ndash45] HOGis based on computing and counting the gradient directionhistogram of local regions DAISY is a fast computing localimage feature descriptor for dense feature extraction andit is based on gradient orientation histograms similar tothe SIFT descriptor ORB uses an oriented FAST detectionmethod and the rotated BRIEF descriptors unlike BRIEFORB is comparatively scale and rotation invariant while stillemploying the very efficient Hamming distance metric formatching SIFT is the most widely used algorithm of keypoint detection and description It takes full advantage ofimage local information SIFT feature has a good effect inrotation scale and translation and it is robust to changes inangle of view and illumination these features are beneficialto the effective expression of targets information For HOGand DAISY image features regions are designed featuresare computed and sent into SVM classifier to be classifiedFor ORB and SIFT they do not have acquisition featuresregions and specified number of features we get the imagefeatures based on Bag-of-Words (BoW) model by treatingimage features as words In the first instance all featurespoints of all training build the visual vocabulary in the nextplace a feature vector of occurrence counts of the vocabularyis constructed from an image in the end the feature vectoris sent into SVM classifier to be classified DeCAF uses fiveconvolutional layers and two fully connected layers to extractfeatures and a SVM to classify the image into the right group[1]

Here we performed vehicle classification experiments onsheared data set Table 2 shows the accuracy and FPS of CNNsand other state-of-the-art methods in test other methods arevery slow because they take a lot of time to extract featuresIt can be observed that the results show the effectiveness ofCNNs on vehicle classification problem

From another point of view we demonstrated the clas-sification ability of each method by confusion matrix of the

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

2 Advances in Multimedia

were initialized by the pretrained parameters and trainedthe model on our labeled images data set thus we got abetter classification performance than the previous withoutpretraining

This paper is organized as follows The related works areintroduced in Section 2 Vehicle detection and classificationbased on CNNs and a pretraining approach are described inSection 3 In Section 4 the vehicles data set is presented andwe evaluated the presented approaches on our data and theexperimental results and a performance evaluation are givenFinally Section 5 concludes the paper

2 Related Works

Existing methods use various types of signal for vehicledetection and classification including acoustic signal [2ndash5] radar signal [6 7] ultrasonic signal [8] infrared ther-mal signal [9] magnetic signal [10] 3D lidar signal [11]and imagevideo signal [12ndash16] Furthermore some of themethods can be combined with a variety of signals such asradarampvision signal [7] and audioampvision signal [17] Usuallythe detection and classification performance is excellent inthesemethods because of the precise signal data but there aremany hardware devices involved in these methods resultingin larger deployment cost and even higher failure rate

The evolution of image processing techniques togetherwith wide deployment of surveillance cameras facilitatesimage-based vehicle detection and classification Variousapproaches to image-based vehicle detection and classifica-tion have been proposed over the last few years Kazemi etal [13] used 3 different kinds of feature extractors Fouriertransform Wavelet transform and Curvelet transform torecognize and classify 5 models of vehicles k-nearest neigh-bor is used as classifier They compare the 3 proposedapproaches and find that the Curvelet transform can extractbetter features Chen et al [18] presented a system for vehicledetection tracking and classification from roadside closedcircuit television (CCTV) First a Kalman filter trackeda vehicle to enable classification by majority voting overseveral consecutive frames then they trained a support vectormachine (SVM) using a combination of a vehicle silhouetteand intensity-based pyramid histogram of oriented gradient(HOG) features extracted following background subtractionclassifying foreground blobs with majority voting Wen et al[19] used Haar-like feature pool on a 32lowast32 grayscale imagepatch to represent a vehiclersquos appearance and then proposed arapid incremental learning algorithmofAdaBoost to improvethe performance of AdaBoost Arrospide and Salgado [16]analyzed the individual performance of popular techniquesfor vehicle verification and found that classifiers based onGabor and HOG features achieve the best results and outper-form principal component analysis (PCA) and other classi-fiers based on features as symmetry and gradient Mishra andBanerjee [20] detected vehicle using background extractedHaar pyramidal histogram of oriented gradients shape andscale-invariant feature transform features designed a mul-tiple kernel classifier based on k-nearest neighbor to dividethe vehicles into 4 categories Tourani and Shahbahrami [21]

combined different imagevideo processing methods includ-ing object detection edge detection frame differentiationand Kalman filter to propose a method which resulted inabout 95 percent accuracy for classification and about 4percent error in vehicle detection targets In these methodsthe classification results are very good however there arestill some problems First the image features are limitedby the hand-crafted features algorithms to represent richinformation Second the hand-crafted features algorithmsrequire a lot of calculation so they are not suitable forreal-time applications especially for embedding in front-endcamera devices Third most of them are used in fixed scenesand background environments it is difficult for them to dealwith complex environment

More recently deep learning has become a hot topic inobject detection and object classification area Wang et al[22] proposed a novel deep learning based vehicle detec-tion algorithm with 2D deep belief network the 2D-DBNarchitecture uses second-order planes instead of first-ordervector as input and uses bilinear projection for retainingdiscriminative information so as to determine the size ofthe deep architecture which enhances the success rate ofvehicle detection Their algorithm preforms very good intheir datasets He et al [1] proposed a new efficient vehicledetection and classification approach based on convolutionalneural network the features extracted by this method outper-form those generated by traditional approaches Yi et al [23]proposed a deep convolution network based on pretrainedAlexNet model for deciding whether a certain image patchcontains a vehicle or not in Wide Area Motion Imagery(WAMI) imagery analysis Li et al [24] presented the 3Drange scan data in a 2D point map and used a single 2Dend-to-end fully convolutional network to predict the vehicleconfidence and the bounding boxes simultaneously and theygot the state-of-the-art performance on the KITTI dataset

Meantime object detection and classification based onConvolutional neural networks (CNNs) [25ndash27] are verysuccessful in the field of computer vision recently The firstwork about object detection and classification based ondeep learning has been done in 2013 Sermanet et al [28]present an integrated framework for using deep learningfor object detection localization and classification thisframework obtains very competitive results for the detectionand classifications tasks Up to now excellent object detectionand classification models based on deep learning include R-CNN [29] Fast R-CNN [30] YOLO [31] Faster R-CNN [32]SSD [33] and R-FCN [34] these models achieve state-of-the-art results on several data sets Before the YOLO manyapproaches on object detection for example R-CNN andFaster R-CNN repurpose classifiers to perform detectionInstead YOLO frame object detection is a regression problemto separated bounding boxes and associated class probabil-ities The YOLO framework uses a custom network basedon the Googlenet architecture using 852 billion operationsfor a forward pass However a more recent improved modelcalled YOLOv2 [35] achieves comparable results on standardtasks like PASCAL VOC and COCO In YOLOv2 network ituses a newmodel calledDarknet-19 and has 19 convolutionallayers and 5 maxpooling layers the model only requires 558

Advances in Multimedia 3

Classificationbased on themodel

Classificationresult

Vehicles detection based onYOLOv2

To labelimages

To pre-train model

To train thepre-trainedmodel

Image

Detection basedon YOLOv2

Images Dataset

Figure 1 Flowchart of vehicle classification

billion operations In conclusion YOLOv2 is a state-of-the-art detection system it is better faster and stronger thanothers and applied for object detection tasks in this work

At last unsupervised pretraining initializes the model toa point in the parameter space that somehow renders theoptimization process more effective in the sense of achievinga lower minimum of the empirical loss function [36] Muchrecent research has been devoted to learning algorithms fordeep architectures such as Deep Belief Networks [37 38] andstacks of autoencoder variants [39] After vehicle detectionwe can easily get a lot of unlabeled images of vehicles andoptimize the classification model parameters initialization byunsupervised pretraining

3 Methodology

In this section we will present the details of themethod basedon CNNs for image-based vehicle detection and vehicle clas-sification This section contains three parts vehicle detectionvehicle classification and pretraining approachThe relationsbetween each part and the overall framework of the entireidea is shown in Figure 1

31 Vehicle Detection Our images taken from static camerasin different refueling station contain front views of vehiclesor side views of vehicles at any point The vehicles in imagesare very indeterminacy this makes the vehicle detectionmore difficult in traditional methods based on hand-craftedfeatures

The YOLOv2 model is trained on COCO data sets it candetect 80 common objects in life such as person bicycle carbus train truck boat bird cat etc and therefore we canperform vehicle detection based on YOLOv2 In a picture ofthe vehicle which is waiting for entering the refueling stationshown in Figure 2 YOLOv2 can well detect many objects forexample the security guards drivers the vehicle and queuedvehicles and even vehicles on the side of road Here ourgoal is to pick up the vehicle which is waiting for enteringin picture

truck

truck

person

person

personpersperson

car car car

Figure 2 YOLOv2 detection Top truck Down car

Although trained YOLOv2 can detect the vehicle anddivide it into bicycle car motorbike bus and truck it doesnot meet our classification categories In order to solve thisproblem to fine-tune the YOLOv2 on our data could be asolution but this method needs amount of manual labeleddata and massive computation it is not a preferable methodfor us and then we presented a rule-based method todetect four categories vehicles more accurately First fromthe YOLOv2 detection results we select the objects whichare very similar to our targets such as car bus truck andmotorbike second according to the distance between thevehicle and the camera the closer the vehicle is the biggerthe target is we choose the most similar vehicle in pictureas the target vehicle entering the refueling station for furthervehicle behavior analysis

32 Vehicle Classification According to the function and sizeof vehicles vehicles will be divided into four categories ofmotorcycle transporter passenger and others Motorcycleincludes motorcycle and motor tricycle transporter includestruck and container car passenger includes sedan hatchbackcoupe van SUV and MPV others include vehicles used inagricultural production and infrastructure such as tractorand crane and other types of vehicles Figure 3 shows thesheared samples in four categories in each column As we cansee samples images are very different in shape color sizeand angle of camera even the samples images in the samecategory And Figure 3(b) bottom and Figure 3(c) bottom arenot in the same category but they are very similar especiallyon front face shape and color whichmakes the classificationbetween transporter and passenger more difficult

4 Advances in Multimedia

(a) (b) (c) (d)

Figure 3 Sheared vehicle image examples for four categories (a) Motorcycle (b) transporter (c) passenger (d) others

Table 1 The structure of C4M3F2

Layer TypeActivation SizeStride FiltersConvolutionalReLU 3times31 32Max Pooling 2times22ConvolutionalReLU 3times31 64ConvolutionalReLU 3times31 64Max Pooling 2times22ConvolutionalReLU 3times31 64Max Pooling 2times22Fully ConnectedReLU 1024Fully Connected 1024Softmax 4

To solve this difficult problem we presented a convo-lutional classification model which is effective and requireslittle amount of operations Our model called C4M3F2 has4 convolutional layers 3 max pooling layers and 2 fullyconnected layers

Each convolutional layer contains multiple (32 or 64)3lowast3 kernels and each kernel represents a filter connected tothe outputs of the previous layer Each max pooling layercontains multiple max pooling with 2times2 filters and stride2 it effectively reduces the feature dimension and avoidsoverfitting For fully connected layers each layer contains1024 neurons each neuron makes prediction from its allinput and it is connected to all the neurons in previous layersFor each sheared vehicle image detected from YOLOv2 ithas been resized to 48times48 and then passed into C4M3F2Eventually all the features are passed to softmax layer andwhat we need to do is just minimizing the cross entropy lossbetween softmax outputs and the input labels Table 1 showsthe structure of our model C4M3F2

33 Pretraining Approach With the purpose of achievinga satisfactory classification result we need more labeledimages for training our model but there is a shortage oflabeled images however there are plenty of images collectedeasily and how to use the plenty of unlabeled images foroptimization of our classification model has become themaincontent in this subsection

The motivation of this unsupervised pretraining ap-proach is to optimize the parameters in convolutional kernel

Input Code

Encoder

Reconstruction

Decoder

Error

Figure 4 Autoencoder

parameters Kernel training in C4M3F2 starts from a randominitialization and we hope that the kernel training procedurecan be optimized and accelerated using a good initial valueobtained by unsupervised pretraining In addition pretrain-ing initializes themodel to a point in the parameter space thatsomehow renders the optimization process more effective inthe sense of achieving a lower minimum of the loss function[36] Next we will explain how to greedy layer-wise pretrainthe convolution layers and the fully connected layers in theC4M3F2 model Max pooling layer function is subsamplingit is not included in layer-wise pretraining process

An autoencoder [40] neural network is an unsupervisedlearning algorithm that applies backpropagation setting thetarget values to be equal to the inputs It uses a set ofrecognition weights to convert the input into code and thenuses a set of generative weights to convert the code into anapproximate reconstruction of the input Autoencoder musttry to reconstruct the input which aims to minimize thereconstruction error as shown in Figure 4

According to the aim of autoencoder our approach forunsupervised pretrainingwill be explained In every convolu-tional layer convolution can be regarded as the encoder anddeconvolution [41] is taken as the decoder which is a veryunfortunate name and also called transposed convolutionAn input image is passed into the encoder then the outputcode getting from encoder is passed into the decoder toreconstruct the input image Here Euclidean distance whichmeans the reconstruct error is used to measure the similaritybetween the input image and reconstructed image so the aimof our approach is minimizing the Euclidean paradigm Forpretraining the next layer first we should drop the decoderand freeze the weights in the encoder of previous layers andthen take the output code of previous layer as the input in this

Advances in Multimedia 5

layer and do the same things as in previous layer Next howto use transposed convolution to construct and minimize theloss function in one convolutional layer will be described indetail as follows

The convolution of a feature maps and an image can bedefined as

119910119894119895 = 119908119894 oplus 119909119895 (1)

where oplus denotes the 2D convolution119910119894119895 isin 119877119903119909times119888119909 is theconvolution result and the padding is set to keep the input andoutput dimensions consistent 119908119894 isin 119877119903119908times119888119908 is the 119894th kerneland 119909119895 isin 119877119903119909times119888119909 denotes the 119895th training image

Then transform the convolution based on circulantmatrix in linear system A circulant matrix is a special kind ofToeplitz matrix where each row vector is rotated one elementto the right relative to the preceding row vector An 119899 times 119899circulant matrix C takes the form in

119862 =[[[[[[[[[

1198880 119888119899minus11198881 1198880 sdot sdot sdot 1198882 11988811198883 1198882 d119888119899minus2 119888119899minus3119888119899minus1 119888119899minus2 sdot sdot sdot 1198880 119888119899minus11198881 1198880

]]]]]]]]](2)

Let 119891119895 ℎ119894 isin 119877119903119890times119888119890 be the extension of 119909119895 119908119894 where 119903119890 =119903119909 + 119903119908 minus 1 119888119890 = 119888119909 + 119888119908 minus 1 And the method is as follows (3)and (4) where 119874 is zero matrix

119891119895 = [119909119895 119874119874 119874] (3)

ℎ119894 = [119908119894 119874119874 119874] (4)

Let 119891V119895 isin 119877119903119890119888119890times1 be 119891119895 in vectored form 119903119900119908119886 be a row

of ℎ119894 and 119903119900119908119886 = [1198990 1198991 1198992 sdot sdot sdot 119899119888119890minus1] To build circulant

matrices 11986701198671 1198672 sdot sdot sdot 119867119888119890minus1 by 119903119900119908119886 and by these circu-

lant matrices a block circulant matrix is defined as shown informula (5)

119867 =[[[[[[[[[

1198670 119867119888119890minus11198671 1198670 sdot sdot sdot 1198672 11986711198673 1198672 d

119867119888119890minus2 119867119888

119890minus3119867119888

119890minus1 119867119888

119890minus2

sdot sdot sdot 1198670 119867119888119890minus11198671 1198670

]]]]]]]]](5)

Here we can transform the convolution into (6)

119876 = 119867119891V119895 (6)

119876 is the vector form of result of convolution calculationand then to reshape 119876 into 1198761015840 isin 119877119903119890times119888119890 In this convolutionprocess the padding is dealing with filling 0 but in actualimplementation of our approach we keep the convolutionalinput and output dimensions consistent So we need to prune

the extra values to keep the input and output dimensionsconsistent So we intercept the matrix 1198761015840 then 119910119894119895 = 1198761015840[119903119908 minus1 119903119890 minus 119903119908 + 1 119888119908 minus 1 119888119890 minus 119888119908 + 1]

To simplify the calculation we extract the effective rowsof 119867 according to the effective row indexes which indicatesthe position of the elements of 119910119894119895 in 119876 and denote these rowsby119882119894 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times2119888119890

Now 119884119894119895 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times1 is vector form of 119910119894119895 sothe convolution can be rewritten as

119884119894119895 = 119882119894119891V119895 (7)

There are 119869 training vehicle images and 119870 kernels Let119883 = [119891V1 119891V2 119891V

119895 ] and 119882 = [11988211198822 119882119870]The convolution can be calculated as

119884 = W119883 (8)

And the deconvolution can be calculated

1198831015840 = 119882119879119884 (9)

So 1198831015840 is the X reconstruction Then loss function basedon Euclidean paradigm is defined in formula (10) being

loss (119882119883) = 10038171003817100381710038171003817119883 minus 1198831015840100381710038171003817100381710038172 = radic 119869sum119895=1

(119883119895 minus 1198831015840119895)2 (10)

Then we used adam optimizer which is an algorithm forfirst-order gradient-based optimization of stochastic objec-tive functions based on adaptive estimates of lower-ordermoments to solve the minimum optimization problem informula (10)

After the greedy layer-wise unsupervised pretraining weinitiate the parameters in every convolutional layer withthe pretrained values and run the supervised training forclassification according to themethod in previous subsection

4 Experiments and Discussions

We evaluated the presented algorithm on our data andcompared it with other four state-of-the-art methods

41 Datasets and Experiment Environment The vehiclesimages are taken by the static cameras in different refuelstations after being compressed they are sent to serversThe quality of images on servers is lower than that selectedby random and classified into four categories of motorcycletransporter passenger and others by hand We got 498motorcycle images 1109 transporter images 1238 passengerimages and 328 other images Due to the time-consumingand labor-intensive of manual labeling there is a shortage oflabeled images Image augmentation has been used to enrichthe data Keras the excellent high level neural network APIprovides the ImageDataGenerator for image data preparationand augmentation Shear range is set to -02 to 02 zoomrange is set to -02 to 02 rotation range is set to -7 to 7 sizeis set to 256times256 and the points outside the boundaries are

6 Advances in Multimedia

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

OriginalSheared

02

03

04

05

06

Accu

racy 07

08

Figure 5 Comparison of training process between original andsheared

filled according to the nearest mode After configuration andtaking into account the balance of data we fitted it on ourdata and got 1400 samples for every categories on the trainingset and 600 samples for every categories on the testing set toassess our classification model

For vehicle classification the CNNs are under Tensorflowframework the SIFT is under OpenCV (httpsopencvorg)and other feature embedding methods are under scikit-image (httpscikit-imageorg) All the experiments wereconducted on a regular notebook PC (25-GHz 8-core CPU12-G RAM and Ubuntu 64-bit OS)

42 Vehicle Detection Experiment with YOLOv2 For theexperiment using original images the original training setand test set are used for training and testing for the exper-iment using sheared images we used the approach based ontrained YOLOv2 to detect the original training set and testset to get the sheared training set and sheared testing set fortraining and testing

To verify the importance of vehicle detection for vehicleclassification we designed two groups of vehicle classifica-tion experiments one using original images and the otherone using sheared images after vehicle detection then theC4M3F2model is used for vehicle classification experiments

We initialized the C4M3F2 model by truncated nor-mal distribution fitted the model on original training setand sheared training set for 2000 epochs respectively andrecorded the accuracy of our C4M3F2 model on differenttesting set the results are shown in Figure 5 As we expectedthe sheared images of vehicles more accurately representthe characteristics of vehicles while pruning more uselessinformation and facilitating the feature extraction and vehicleclassification As we can see in Figure 5 the accuracy ofC4M3F2 model using sheared data set is much better thanthe one using original data set in the previous training thecharacteristics of vehicles were extracted more accurately sothat the model quickly achieved a better classification resultsand a stable status

Finally the accuracy of C4M3F2 using sheared data set is9142 which is 453 higher than the 8689 of C4M3F2using original data set It can be concluded that the resultsof vehicle classification using sheared data set after vehicledetection based on YOLOv2 can be improved effectively

Table 2 Accuracy and FPS of different methods

Method Accuracy FPSHOG+SVM 6012 4DAISY+SVM 6904 2ORB+BoW+SVM 6407 7SIFT+BoW+SVM 7449 5DeCAF[1] 6620 13CNNs 9142 800

43 Compare Our Approach with Others There are manyother image classification methods To assess our classifi-cation model we compared our approach with other fivemethods

The five methods are based on the image features definedby the scholars in computer image processing Consideringthe comprehensive factors four kinds of image features anda convolutional method are selected they are histograms oforiented gradient (HOG) [42] DAISY [43] oriented FASTand rotated BRIEF (ORB)[44] scale-invariant feature trans-form (SIFT) [45] and DeCAF[1] respectively These methodsare excellent in target object detection in [1 42ndash45] HOGis based on computing and counting the gradient directionhistogram of local regions DAISY is a fast computing localimage feature descriptor for dense feature extraction andit is based on gradient orientation histograms similar tothe SIFT descriptor ORB uses an oriented FAST detectionmethod and the rotated BRIEF descriptors unlike BRIEFORB is comparatively scale and rotation invariant while stillemploying the very efficient Hamming distance metric formatching SIFT is the most widely used algorithm of keypoint detection and description It takes full advantage ofimage local information SIFT feature has a good effect inrotation scale and translation and it is robust to changes inangle of view and illumination these features are beneficialto the effective expression of targets information For HOGand DAISY image features regions are designed featuresare computed and sent into SVM classifier to be classifiedFor ORB and SIFT they do not have acquisition featuresregions and specified number of features we get the imagefeatures based on Bag-of-Words (BoW) model by treatingimage features as words In the first instance all featurespoints of all training build the visual vocabulary in the nextplace a feature vector of occurrence counts of the vocabularyis constructed from an image in the end the feature vectoris sent into SVM classifier to be classified DeCAF uses fiveconvolutional layers and two fully connected layers to extractfeatures and a SVM to classify the image into the right group[1]

Here we performed vehicle classification experiments onsheared data set Table 2 shows the accuracy and FPS of CNNsand other state-of-the-art methods in test other methods arevery slow because they take a lot of time to extract featuresIt can be observed that the results show the effectiveness ofCNNs on vehicle classification problem

From another point of view we demonstrated the clas-sification ability of each method by confusion matrix of the

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

Advances in Multimedia 3

Classificationbased on themodel

Classificationresult

Vehicles detection based onYOLOv2

To labelimages

To pre-train model

To train thepre-trainedmodel

Image

Detection basedon YOLOv2

Images Dataset

Figure 1 Flowchart of vehicle classification

billion operations In conclusion YOLOv2 is a state-of-the-art detection system it is better faster and stronger thanothers and applied for object detection tasks in this work

At last unsupervised pretraining initializes the model toa point in the parameter space that somehow renders theoptimization process more effective in the sense of achievinga lower minimum of the empirical loss function [36] Muchrecent research has been devoted to learning algorithms fordeep architectures such as Deep Belief Networks [37 38] andstacks of autoencoder variants [39] After vehicle detectionwe can easily get a lot of unlabeled images of vehicles andoptimize the classification model parameters initialization byunsupervised pretraining

3 Methodology

In this section we will present the details of themethod basedon CNNs for image-based vehicle detection and vehicle clas-sification This section contains three parts vehicle detectionvehicle classification and pretraining approachThe relationsbetween each part and the overall framework of the entireidea is shown in Figure 1

31 Vehicle Detection Our images taken from static camerasin different refueling station contain front views of vehiclesor side views of vehicles at any point The vehicles in imagesare very indeterminacy this makes the vehicle detectionmore difficult in traditional methods based on hand-craftedfeatures

The YOLOv2 model is trained on COCO data sets it candetect 80 common objects in life such as person bicycle carbus train truck boat bird cat etc and therefore we canperform vehicle detection based on YOLOv2 In a picture ofthe vehicle which is waiting for entering the refueling stationshown in Figure 2 YOLOv2 can well detect many objects forexample the security guards drivers the vehicle and queuedvehicles and even vehicles on the side of road Here ourgoal is to pick up the vehicle which is waiting for enteringin picture

truck

truck

person

person

personpersperson

car car car

Figure 2 YOLOv2 detection Top truck Down car

Although trained YOLOv2 can detect the vehicle anddivide it into bicycle car motorbike bus and truck it doesnot meet our classification categories In order to solve thisproblem to fine-tune the YOLOv2 on our data could be asolution but this method needs amount of manual labeleddata and massive computation it is not a preferable methodfor us and then we presented a rule-based method todetect four categories vehicles more accurately First fromthe YOLOv2 detection results we select the objects whichare very similar to our targets such as car bus truck andmotorbike second according to the distance between thevehicle and the camera the closer the vehicle is the biggerthe target is we choose the most similar vehicle in pictureas the target vehicle entering the refueling station for furthervehicle behavior analysis

32 Vehicle Classification According to the function and sizeof vehicles vehicles will be divided into four categories ofmotorcycle transporter passenger and others Motorcycleincludes motorcycle and motor tricycle transporter includestruck and container car passenger includes sedan hatchbackcoupe van SUV and MPV others include vehicles used inagricultural production and infrastructure such as tractorand crane and other types of vehicles Figure 3 shows thesheared samples in four categories in each column As we cansee samples images are very different in shape color sizeand angle of camera even the samples images in the samecategory And Figure 3(b) bottom and Figure 3(c) bottom arenot in the same category but they are very similar especiallyon front face shape and color whichmakes the classificationbetween transporter and passenger more difficult

4 Advances in Multimedia

(a) (b) (c) (d)

Figure 3 Sheared vehicle image examples for four categories (a) Motorcycle (b) transporter (c) passenger (d) others

Table 1 The structure of C4M3F2

Layer TypeActivation SizeStride FiltersConvolutionalReLU 3times31 32Max Pooling 2times22ConvolutionalReLU 3times31 64ConvolutionalReLU 3times31 64Max Pooling 2times22ConvolutionalReLU 3times31 64Max Pooling 2times22Fully ConnectedReLU 1024Fully Connected 1024Softmax 4

To solve this difficult problem we presented a convo-lutional classification model which is effective and requireslittle amount of operations Our model called C4M3F2 has4 convolutional layers 3 max pooling layers and 2 fullyconnected layers

Each convolutional layer contains multiple (32 or 64)3lowast3 kernels and each kernel represents a filter connected tothe outputs of the previous layer Each max pooling layercontains multiple max pooling with 2times2 filters and stride2 it effectively reduces the feature dimension and avoidsoverfitting For fully connected layers each layer contains1024 neurons each neuron makes prediction from its allinput and it is connected to all the neurons in previous layersFor each sheared vehicle image detected from YOLOv2 ithas been resized to 48times48 and then passed into C4M3F2Eventually all the features are passed to softmax layer andwhat we need to do is just minimizing the cross entropy lossbetween softmax outputs and the input labels Table 1 showsthe structure of our model C4M3F2

33 Pretraining Approach With the purpose of achievinga satisfactory classification result we need more labeledimages for training our model but there is a shortage oflabeled images however there are plenty of images collectedeasily and how to use the plenty of unlabeled images foroptimization of our classification model has become themaincontent in this subsection

The motivation of this unsupervised pretraining ap-proach is to optimize the parameters in convolutional kernel

Input Code

Encoder

Reconstruction

Decoder

Error

Figure 4 Autoencoder

parameters Kernel training in C4M3F2 starts from a randominitialization and we hope that the kernel training procedurecan be optimized and accelerated using a good initial valueobtained by unsupervised pretraining In addition pretrain-ing initializes themodel to a point in the parameter space thatsomehow renders the optimization process more effective inthe sense of achieving a lower minimum of the loss function[36] Next we will explain how to greedy layer-wise pretrainthe convolution layers and the fully connected layers in theC4M3F2 model Max pooling layer function is subsamplingit is not included in layer-wise pretraining process

An autoencoder [40] neural network is an unsupervisedlearning algorithm that applies backpropagation setting thetarget values to be equal to the inputs It uses a set ofrecognition weights to convert the input into code and thenuses a set of generative weights to convert the code into anapproximate reconstruction of the input Autoencoder musttry to reconstruct the input which aims to minimize thereconstruction error as shown in Figure 4

According to the aim of autoencoder our approach forunsupervised pretrainingwill be explained In every convolu-tional layer convolution can be regarded as the encoder anddeconvolution [41] is taken as the decoder which is a veryunfortunate name and also called transposed convolutionAn input image is passed into the encoder then the outputcode getting from encoder is passed into the decoder toreconstruct the input image Here Euclidean distance whichmeans the reconstruct error is used to measure the similaritybetween the input image and reconstructed image so the aimof our approach is minimizing the Euclidean paradigm Forpretraining the next layer first we should drop the decoderand freeze the weights in the encoder of previous layers andthen take the output code of previous layer as the input in this

Advances in Multimedia 5

layer and do the same things as in previous layer Next howto use transposed convolution to construct and minimize theloss function in one convolutional layer will be described indetail as follows

The convolution of a feature maps and an image can bedefined as

119910119894119895 = 119908119894 oplus 119909119895 (1)

where oplus denotes the 2D convolution119910119894119895 isin 119877119903119909times119888119909 is theconvolution result and the padding is set to keep the input andoutput dimensions consistent 119908119894 isin 119877119903119908times119888119908 is the 119894th kerneland 119909119895 isin 119877119903119909times119888119909 denotes the 119895th training image

Then transform the convolution based on circulantmatrix in linear system A circulant matrix is a special kind ofToeplitz matrix where each row vector is rotated one elementto the right relative to the preceding row vector An 119899 times 119899circulant matrix C takes the form in

119862 =[[[[[[[[[

1198880 119888119899minus11198881 1198880 sdot sdot sdot 1198882 11988811198883 1198882 d119888119899minus2 119888119899minus3119888119899minus1 119888119899minus2 sdot sdot sdot 1198880 119888119899minus11198881 1198880

]]]]]]]]](2)

Let 119891119895 ℎ119894 isin 119877119903119890times119888119890 be the extension of 119909119895 119908119894 where 119903119890 =119903119909 + 119903119908 minus 1 119888119890 = 119888119909 + 119888119908 minus 1 And the method is as follows (3)and (4) where 119874 is zero matrix

119891119895 = [119909119895 119874119874 119874] (3)

ℎ119894 = [119908119894 119874119874 119874] (4)

Let 119891V119895 isin 119877119903119890119888119890times1 be 119891119895 in vectored form 119903119900119908119886 be a row

of ℎ119894 and 119903119900119908119886 = [1198990 1198991 1198992 sdot sdot sdot 119899119888119890minus1] To build circulant

matrices 11986701198671 1198672 sdot sdot sdot 119867119888119890minus1 by 119903119900119908119886 and by these circu-

lant matrices a block circulant matrix is defined as shown informula (5)

119867 =[[[[[[[[[

1198670 119867119888119890minus11198671 1198670 sdot sdot sdot 1198672 11986711198673 1198672 d

119867119888119890minus2 119867119888

119890minus3119867119888

119890minus1 119867119888

119890minus2

sdot sdot sdot 1198670 119867119888119890minus11198671 1198670

]]]]]]]]](5)

Here we can transform the convolution into (6)

119876 = 119867119891V119895 (6)

119876 is the vector form of result of convolution calculationand then to reshape 119876 into 1198761015840 isin 119877119903119890times119888119890 In this convolutionprocess the padding is dealing with filling 0 but in actualimplementation of our approach we keep the convolutionalinput and output dimensions consistent So we need to prune

the extra values to keep the input and output dimensionsconsistent So we intercept the matrix 1198761015840 then 119910119894119895 = 1198761015840[119903119908 minus1 119903119890 minus 119903119908 + 1 119888119908 minus 1 119888119890 minus 119888119908 + 1]

To simplify the calculation we extract the effective rowsof 119867 according to the effective row indexes which indicatesthe position of the elements of 119910119894119895 in 119876 and denote these rowsby119882119894 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times2119888119890

Now 119884119894119895 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times1 is vector form of 119910119894119895 sothe convolution can be rewritten as

119884119894119895 = 119882119894119891V119895 (7)

There are 119869 training vehicle images and 119870 kernels Let119883 = [119891V1 119891V2 119891V

119895 ] and 119882 = [11988211198822 119882119870]The convolution can be calculated as

119884 = W119883 (8)

And the deconvolution can be calculated

1198831015840 = 119882119879119884 (9)

So 1198831015840 is the X reconstruction Then loss function basedon Euclidean paradigm is defined in formula (10) being

loss (119882119883) = 10038171003817100381710038171003817119883 minus 1198831015840100381710038171003817100381710038172 = radic 119869sum119895=1

(119883119895 minus 1198831015840119895)2 (10)

Then we used adam optimizer which is an algorithm forfirst-order gradient-based optimization of stochastic objec-tive functions based on adaptive estimates of lower-ordermoments to solve the minimum optimization problem informula (10)

After the greedy layer-wise unsupervised pretraining weinitiate the parameters in every convolutional layer withthe pretrained values and run the supervised training forclassification according to themethod in previous subsection

4 Experiments and Discussions

We evaluated the presented algorithm on our data andcompared it with other four state-of-the-art methods

41 Datasets and Experiment Environment The vehiclesimages are taken by the static cameras in different refuelstations after being compressed they are sent to serversThe quality of images on servers is lower than that selectedby random and classified into four categories of motorcycletransporter passenger and others by hand We got 498motorcycle images 1109 transporter images 1238 passengerimages and 328 other images Due to the time-consumingand labor-intensive of manual labeling there is a shortage oflabeled images Image augmentation has been used to enrichthe data Keras the excellent high level neural network APIprovides the ImageDataGenerator for image data preparationand augmentation Shear range is set to -02 to 02 zoomrange is set to -02 to 02 rotation range is set to -7 to 7 sizeis set to 256times256 and the points outside the boundaries are

6 Advances in Multimedia

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

OriginalSheared

02

03

04

05

06

Accu

racy 07

08

Figure 5 Comparison of training process between original andsheared

filled according to the nearest mode After configuration andtaking into account the balance of data we fitted it on ourdata and got 1400 samples for every categories on the trainingset and 600 samples for every categories on the testing set toassess our classification model

For vehicle classification the CNNs are under Tensorflowframework the SIFT is under OpenCV (httpsopencvorg)and other feature embedding methods are under scikit-image (httpscikit-imageorg) All the experiments wereconducted on a regular notebook PC (25-GHz 8-core CPU12-G RAM and Ubuntu 64-bit OS)

42 Vehicle Detection Experiment with YOLOv2 For theexperiment using original images the original training setand test set are used for training and testing for the exper-iment using sheared images we used the approach based ontrained YOLOv2 to detect the original training set and testset to get the sheared training set and sheared testing set fortraining and testing

To verify the importance of vehicle detection for vehicleclassification we designed two groups of vehicle classifica-tion experiments one using original images and the otherone using sheared images after vehicle detection then theC4M3F2model is used for vehicle classification experiments

We initialized the C4M3F2 model by truncated nor-mal distribution fitted the model on original training setand sheared training set for 2000 epochs respectively andrecorded the accuracy of our C4M3F2 model on differenttesting set the results are shown in Figure 5 As we expectedthe sheared images of vehicles more accurately representthe characteristics of vehicles while pruning more uselessinformation and facilitating the feature extraction and vehicleclassification As we can see in Figure 5 the accuracy ofC4M3F2 model using sheared data set is much better thanthe one using original data set in the previous training thecharacteristics of vehicles were extracted more accurately sothat the model quickly achieved a better classification resultsand a stable status

Finally the accuracy of C4M3F2 using sheared data set is9142 which is 453 higher than the 8689 of C4M3F2using original data set It can be concluded that the resultsof vehicle classification using sheared data set after vehicledetection based on YOLOv2 can be improved effectively

Table 2 Accuracy and FPS of different methods

Method Accuracy FPSHOG+SVM 6012 4DAISY+SVM 6904 2ORB+BoW+SVM 6407 7SIFT+BoW+SVM 7449 5DeCAF[1] 6620 13CNNs 9142 800

43 Compare Our Approach with Others There are manyother image classification methods To assess our classifi-cation model we compared our approach with other fivemethods

The five methods are based on the image features definedby the scholars in computer image processing Consideringthe comprehensive factors four kinds of image features anda convolutional method are selected they are histograms oforiented gradient (HOG) [42] DAISY [43] oriented FASTand rotated BRIEF (ORB)[44] scale-invariant feature trans-form (SIFT) [45] and DeCAF[1] respectively These methodsare excellent in target object detection in [1 42ndash45] HOGis based on computing and counting the gradient directionhistogram of local regions DAISY is a fast computing localimage feature descriptor for dense feature extraction andit is based on gradient orientation histograms similar tothe SIFT descriptor ORB uses an oriented FAST detectionmethod and the rotated BRIEF descriptors unlike BRIEFORB is comparatively scale and rotation invariant while stillemploying the very efficient Hamming distance metric formatching SIFT is the most widely used algorithm of keypoint detection and description It takes full advantage ofimage local information SIFT feature has a good effect inrotation scale and translation and it is robust to changes inangle of view and illumination these features are beneficialto the effective expression of targets information For HOGand DAISY image features regions are designed featuresare computed and sent into SVM classifier to be classifiedFor ORB and SIFT they do not have acquisition featuresregions and specified number of features we get the imagefeatures based on Bag-of-Words (BoW) model by treatingimage features as words In the first instance all featurespoints of all training build the visual vocabulary in the nextplace a feature vector of occurrence counts of the vocabularyis constructed from an image in the end the feature vectoris sent into SVM classifier to be classified DeCAF uses fiveconvolutional layers and two fully connected layers to extractfeatures and a SVM to classify the image into the right group[1]

Here we performed vehicle classification experiments onsheared data set Table 2 shows the accuracy and FPS of CNNsand other state-of-the-art methods in test other methods arevery slow because they take a lot of time to extract featuresIt can be observed that the results show the effectiveness ofCNNs on vehicle classification problem

From another point of view we demonstrated the clas-sification ability of each method by confusion matrix of the

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

4 Advances in Multimedia

(a) (b) (c) (d)

Figure 3 Sheared vehicle image examples for four categories (a) Motorcycle (b) transporter (c) passenger (d) others

Table 1 The structure of C4M3F2

Layer TypeActivation SizeStride FiltersConvolutionalReLU 3times31 32Max Pooling 2times22ConvolutionalReLU 3times31 64ConvolutionalReLU 3times31 64Max Pooling 2times22ConvolutionalReLU 3times31 64Max Pooling 2times22Fully ConnectedReLU 1024Fully Connected 1024Softmax 4

To solve this difficult problem we presented a convo-lutional classification model which is effective and requireslittle amount of operations Our model called C4M3F2 has4 convolutional layers 3 max pooling layers and 2 fullyconnected layers

Each convolutional layer contains multiple (32 or 64)3lowast3 kernels and each kernel represents a filter connected tothe outputs of the previous layer Each max pooling layercontains multiple max pooling with 2times2 filters and stride2 it effectively reduces the feature dimension and avoidsoverfitting For fully connected layers each layer contains1024 neurons each neuron makes prediction from its allinput and it is connected to all the neurons in previous layersFor each sheared vehicle image detected from YOLOv2 ithas been resized to 48times48 and then passed into C4M3F2Eventually all the features are passed to softmax layer andwhat we need to do is just minimizing the cross entropy lossbetween softmax outputs and the input labels Table 1 showsthe structure of our model C4M3F2

33 Pretraining Approach With the purpose of achievinga satisfactory classification result we need more labeledimages for training our model but there is a shortage oflabeled images however there are plenty of images collectedeasily and how to use the plenty of unlabeled images foroptimization of our classification model has become themaincontent in this subsection

The motivation of this unsupervised pretraining ap-proach is to optimize the parameters in convolutional kernel

Input Code

Encoder

Reconstruction

Decoder

Error

Figure 4 Autoencoder

parameters Kernel training in C4M3F2 starts from a randominitialization and we hope that the kernel training procedurecan be optimized and accelerated using a good initial valueobtained by unsupervised pretraining In addition pretrain-ing initializes themodel to a point in the parameter space thatsomehow renders the optimization process more effective inthe sense of achieving a lower minimum of the loss function[36] Next we will explain how to greedy layer-wise pretrainthe convolution layers and the fully connected layers in theC4M3F2 model Max pooling layer function is subsamplingit is not included in layer-wise pretraining process

An autoencoder [40] neural network is an unsupervisedlearning algorithm that applies backpropagation setting thetarget values to be equal to the inputs It uses a set ofrecognition weights to convert the input into code and thenuses a set of generative weights to convert the code into anapproximate reconstruction of the input Autoencoder musttry to reconstruct the input which aims to minimize thereconstruction error as shown in Figure 4

According to the aim of autoencoder our approach forunsupervised pretrainingwill be explained In every convolu-tional layer convolution can be regarded as the encoder anddeconvolution [41] is taken as the decoder which is a veryunfortunate name and also called transposed convolutionAn input image is passed into the encoder then the outputcode getting from encoder is passed into the decoder toreconstruct the input image Here Euclidean distance whichmeans the reconstruct error is used to measure the similaritybetween the input image and reconstructed image so the aimof our approach is minimizing the Euclidean paradigm Forpretraining the next layer first we should drop the decoderand freeze the weights in the encoder of previous layers andthen take the output code of previous layer as the input in this

Advances in Multimedia 5

layer and do the same things as in previous layer Next howto use transposed convolution to construct and minimize theloss function in one convolutional layer will be described indetail as follows

The convolution of a feature maps and an image can bedefined as

119910119894119895 = 119908119894 oplus 119909119895 (1)

where oplus denotes the 2D convolution119910119894119895 isin 119877119903119909times119888119909 is theconvolution result and the padding is set to keep the input andoutput dimensions consistent 119908119894 isin 119877119903119908times119888119908 is the 119894th kerneland 119909119895 isin 119877119903119909times119888119909 denotes the 119895th training image

Then transform the convolution based on circulantmatrix in linear system A circulant matrix is a special kind ofToeplitz matrix where each row vector is rotated one elementto the right relative to the preceding row vector An 119899 times 119899circulant matrix C takes the form in

119862 =[[[[[[[[[

1198880 119888119899minus11198881 1198880 sdot sdot sdot 1198882 11988811198883 1198882 d119888119899minus2 119888119899minus3119888119899minus1 119888119899minus2 sdot sdot sdot 1198880 119888119899minus11198881 1198880

]]]]]]]]](2)

Let 119891119895 ℎ119894 isin 119877119903119890times119888119890 be the extension of 119909119895 119908119894 where 119903119890 =119903119909 + 119903119908 minus 1 119888119890 = 119888119909 + 119888119908 minus 1 And the method is as follows (3)and (4) where 119874 is zero matrix

119891119895 = [119909119895 119874119874 119874] (3)

ℎ119894 = [119908119894 119874119874 119874] (4)

Let 119891V119895 isin 119877119903119890119888119890times1 be 119891119895 in vectored form 119903119900119908119886 be a row

of ℎ119894 and 119903119900119908119886 = [1198990 1198991 1198992 sdot sdot sdot 119899119888119890minus1] To build circulant

matrices 11986701198671 1198672 sdot sdot sdot 119867119888119890minus1 by 119903119900119908119886 and by these circu-

lant matrices a block circulant matrix is defined as shown informula (5)

119867 =[[[[[[[[[

1198670 119867119888119890minus11198671 1198670 sdot sdot sdot 1198672 11986711198673 1198672 d

119867119888119890minus2 119867119888

119890minus3119867119888

119890minus1 119867119888

119890minus2

sdot sdot sdot 1198670 119867119888119890minus11198671 1198670

]]]]]]]]](5)

Here we can transform the convolution into (6)

119876 = 119867119891V119895 (6)

119876 is the vector form of result of convolution calculationand then to reshape 119876 into 1198761015840 isin 119877119903119890times119888119890 In this convolutionprocess the padding is dealing with filling 0 but in actualimplementation of our approach we keep the convolutionalinput and output dimensions consistent So we need to prune

the extra values to keep the input and output dimensionsconsistent So we intercept the matrix 1198761015840 then 119910119894119895 = 1198761015840[119903119908 minus1 119903119890 minus 119903119908 + 1 119888119908 minus 1 119888119890 minus 119888119908 + 1]

To simplify the calculation we extract the effective rowsof 119867 according to the effective row indexes which indicatesthe position of the elements of 119910119894119895 in 119876 and denote these rowsby119882119894 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times2119888119890

Now 119884119894119895 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times1 is vector form of 119910119894119895 sothe convolution can be rewritten as

119884119894119895 = 119882119894119891V119895 (7)

There are 119869 training vehicle images and 119870 kernels Let119883 = [119891V1 119891V2 119891V

119895 ] and 119882 = [11988211198822 119882119870]The convolution can be calculated as

119884 = W119883 (8)

And the deconvolution can be calculated

1198831015840 = 119882119879119884 (9)

So 1198831015840 is the X reconstruction Then loss function basedon Euclidean paradigm is defined in formula (10) being

loss (119882119883) = 10038171003817100381710038171003817119883 minus 1198831015840100381710038171003817100381710038172 = radic 119869sum119895=1

(119883119895 minus 1198831015840119895)2 (10)

Then we used adam optimizer which is an algorithm forfirst-order gradient-based optimization of stochastic objec-tive functions based on adaptive estimates of lower-ordermoments to solve the minimum optimization problem informula (10)

After the greedy layer-wise unsupervised pretraining weinitiate the parameters in every convolutional layer withthe pretrained values and run the supervised training forclassification according to themethod in previous subsection

4 Experiments and Discussions

We evaluated the presented algorithm on our data andcompared it with other four state-of-the-art methods

41 Datasets and Experiment Environment The vehiclesimages are taken by the static cameras in different refuelstations after being compressed they are sent to serversThe quality of images on servers is lower than that selectedby random and classified into four categories of motorcycletransporter passenger and others by hand We got 498motorcycle images 1109 transporter images 1238 passengerimages and 328 other images Due to the time-consumingand labor-intensive of manual labeling there is a shortage oflabeled images Image augmentation has been used to enrichthe data Keras the excellent high level neural network APIprovides the ImageDataGenerator for image data preparationand augmentation Shear range is set to -02 to 02 zoomrange is set to -02 to 02 rotation range is set to -7 to 7 sizeis set to 256times256 and the points outside the boundaries are

6 Advances in Multimedia

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

OriginalSheared

02

03

04

05

06

Accu

racy 07

08

Figure 5 Comparison of training process between original andsheared

filled according to the nearest mode After configuration andtaking into account the balance of data we fitted it on ourdata and got 1400 samples for every categories on the trainingset and 600 samples for every categories on the testing set toassess our classification model

For vehicle classification the CNNs are under Tensorflowframework the SIFT is under OpenCV (httpsopencvorg)and other feature embedding methods are under scikit-image (httpscikit-imageorg) All the experiments wereconducted on a regular notebook PC (25-GHz 8-core CPU12-G RAM and Ubuntu 64-bit OS)

42 Vehicle Detection Experiment with YOLOv2 For theexperiment using original images the original training setand test set are used for training and testing for the exper-iment using sheared images we used the approach based ontrained YOLOv2 to detect the original training set and testset to get the sheared training set and sheared testing set fortraining and testing

To verify the importance of vehicle detection for vehicleclassification we designed two groups of vehicle classifica-tion experiments one using original images and the otherone using sheared images after vehicle detection then theC4M3F2model is used for vehicle classification experiments

We initialized the C4M3F2 model by truncated nor-mal distribution fitted the model on original training setand sheared training set for 2000 epochs respectively andrecorded the accuracy of our C4M3F2 model on differenttesting set the results are shown in Figure 5 As we expectedthe sheared images of vehicles more accurately representthe characteristics of vehicles while pruning more uselessinformation and facilitating the feature extraction and vehicleclassification As we can see in Figure 5 the accuracy ofC4M3F2 model using sheared data set is much better thanthe one using original data set in the previous training thecharacteristics of vehicles were extracted more accurately sothat the model quickly achieved a better classification resultsand a stable status

Finally the accuracy of C4M3F2 using sheared data set is9142 which is 453 higher than the 8689 of C4M3F2using original data set It can be concluded that the resultsof vehicle classification using sheared data set after vehicledetection based on YOLOv2 can be improved effectively

Table 2 Accuracy and FPS of different methods

Method Accuracy FPSHOG+SVM 6012 4DAISY+SVM 6904 2ORB+BoW+SVM 6407 7SIFT+BoW+SVM 7449 5DeCAF[1] 6620 13CNNs 9142 800

43 Compare Our Approach with Others There are manyother image classification methods To assess our classifi-cation model we compared our approach with other fivemethods

The five methods are based on the image features definedby the scholars in computer image processing Consideringthe comprehensive factors four kinds of image features anda convolutional method are selected they are histograms oforiented gradient (HOG) [42] DAISY [43] oriented FASTand rotated BRIEF (ORB)[44] scale-invariant feature trans-form (SIFT) [45] and DeCAF[1] respectively These methodsare excellent in target object detection in [1 42ndash45] HOGis based on computing and counting the gradient directionhistogram of local regions DAISY is a fast computing localimage feature descriptor for dense feature extraction andit is based on gradient orientation histograms similar tothe SIFT descriptor ORB uses an oriented FAST detectionmethod and the rotated BRIEF descriptors unlike BRIEFORB is comparatively scale and rotation invariant while stillemploying the very efficient Hamming distance metric formatching SIFT is the most widely used algorithm of keypoint detection and description It takes full advantage ofimage local information SIFT feature has a good effect inrotation scale and translation and it is robust to changes inangle of view and illumination these features are beneficialto the effective expression of targets information For HOGand DAISY image features regions are designed featuresare computed and sent into SVM classifier to be classifiedFor ORB and SIFT they do not have acquisition featuresregions and specified number of features we get the imagefeatures based on Bag-of-Words (BoW) model by treatingimage features as words In the first instance all featurespoints of all training build the visual vocabulary in the nextplace a feature vector of occurrence counts of the vocabularyis constructed from an image in the end the feature vectoris sent into SVM classifier to be classified DeCAF uses fiveconvolutional layers and two fully connected layers to extractfeatures and a SVM to classify the image into the right group[1]

Here we performed vehicle classification experiments onsheared data set Table 2 shows the accuracy and FPS of CNNsand other state-of-the-art methods in test other methods arevery slow because they take a lot of time to extract featuresIt can be observed that the results show the effectiveness ofCNNs on vehicle classification problem

From another point of view we demonstrated the clas-sification ability of each method by confusion matrix of the

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

Advances in Multimedia 5

layer and do the same things as in previous layer Next howto use transposed convolution to construct and minimize theloss function in one convolutional layer will be described indetail as follows

The convolution of a feature maps and an image can bedefined as

119910119894119895 = 119908119894 oplus 119909119895 (1)

where oplus denotes the 2D convolution119910119894119895 isin 119877119903119909times119888119909 is theconvolution result and the padding is set to keep the input andoutput dimensions consistent 119908119894 isin 119877119903119908times119888119908 is the 119894th kerneland 119909119895 isin 119877119903119909times119888119909 denotes the 119895th training image

Then transform the convolution based on circulantmatrix in linear system A circulant matrix is a special kind ofToeplitz matrix where each row vector is rotated one elementto the right relative to the preceding row vector An 119899 times 119899circulant matrix C takes the form in

119862 =[[[[[[[[[

1198880 119888119899minus11198881 1198880 sdot sdot sdot 1198882 11988811198883 1198882 d119888119899minus2 119888119899minus3119888119899minus1 119888119899minus2 sdot sdot sdot 1198880 119888119899minus11198881 1198880

]]]]]]]]](2)

Let 119891119895 ℎ119894 isin 119877119903119890times119888119890 be the extension of 119909119895 119908119894 where 119903119890 =119903119909 + 119903119908 minus 1 119888119890 = 119888119909 + 119888119908 minus 1 And the method is as follows (3)and (4) where 119874 is zero matrix

119891119895 = [119909119895 119874119874 119874] (3)

ℎ119894 = [119908119894 119874119874 119874] (4)

Let 119891V119895 isin 119877119903119890119888119890times1 be 119891119895 in vectored form 119903119900119908119886 be a row

of ℎ119894 and 119903119900119908119886 = [1198990 1198991 1198992 sdot sdot sdot 119899119888119890minus1] To build circulant

matrices 11986701198671 1198672 sdot sdot sdot 119867119888119890minus1 by 119903119900119908119886 and by these circu-

lant matrices a block circulant matrix is defined as shown informula (5)

119867 =[[[[[[[[[

1198670 119867119888119890minus11198671 1198670 sdot sdot sdot 1198672 11986711198673 1198672 d

119867119888119890minus2 119867119888

119890minus3119867119888

119890minus1 119867119888

119890minus2

sdot sdot sdot 1198670 119867119888119890minus11198671 1198670

]]]]]]]]](5)

Here we can transform the convolution into (6)

119876 = 119867119891V119895 (6)

119876 is the vector form of result of convolution calculationand then to reshape 119876 into 1198761015840 isin 119877119903119890times119888119890 In this convolutionprocess the padding is dealing with filling 0 but in actualimplementation of our approach we keep the convolutionalinput and output dimensions consistent So we need to prune

the extra values to keep the input and output dimensionsconsistent So we intercept the matrix 1198761015840 then 119910119894119895 = 1198761015840[119903119908 minus1 119903119890 minus 119903119908 + 1 119888119908 minus 1 119888119890 minus 119888119908 + 1]

To simplify the calculation we extract the effective rowsof 119867 according to the effective row indexes which indicatesthe position of the elements of 119910119894119895 in 119876 and denote these rowsby119882119894 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times2119888119890

Now 119884119894119895 isin 119877(119903119890minus2119903119908+2)(119888119890minus2119888119908+2)times1 is vector form of 119910119894119895 sothe convolution can be rewritten as

119884119894119895 = 119882119894119891V119895 (7)

There are 119869 training vehicle images and 119870 kernels Let119883 = [119891V1 119891V2 119891V

119895 ] and 119882 = [11988211198822 119882119870]The convolution can be calculated as

119884 = W119883 (8)

And the deconvolution can be calculated

1198831015840 = 119882119879119884 (9)

So 1198831015840 is the X reconstruction Then loss function basedon Euclidean paradigm is defined in formula (10) being

loss (119882119883) = 10038171003817100381710038171003817119883 minus 1198831015840100381710038171003817100381710038172 = radic 119869sum119895=1

(119883119895 minus 1198831015840119895)2 (10)

Then we used adam optimizer which is an algorithm forfirst-order gradient-based optimization of stochastic objec-tive functions based on adaptive estimates of lower-ordermoments to solve the minimum optimization problem informula (10)

After the greedy layer-wise unsupervised pretraining weinitiate the parameters in every convolutional layer withthe pretrained values and run the supervised training forclassification according to themethod in previous subsection

4 Experiments and Discussions

We evaluated the presented algorithm on our data andcompared it with other four state-of-the-art methods

41 Datasets and Experiment Environment The vehiclesimages are taken by the static cameras in different refuelstations after being compressed they are sent to serversThe quality of images on servers is lower than that selectedby random and classified into four categories of motorcycletransporter passenger and others by hand We got 498motorcycle images 1109 transporter images 1238 passengerimages and 328 other images Due to the time-consumingand labor-intensive of manual labeling there is a shortage oflabeled images Image augmentation has been used to enrichthe data Keras the excellent high level neural network APIprovides the ImageDataGenerator for image data preparationand augmentation Shear range is set to -02 to 02 zoomrange is set to -02 to 02 rotation range is set to -7 to 7 sizeis set to 256times256 and the points outside the boundaries are

6 Advances in Multimedia

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

OriginalSheared

02

03

04

05

06

Accu

racy 07

08

Figure 5 Comparison of training process between original andsheared

filled according to the nearest mode After configuration andtaking into account the balance of data we fitted it on ourdata and got 1400 samples for every categories on the trainingset and 600 samples for every categories on the testing set toassess our classification model

For vehicle classification the CNNs are under Tensorflowframework the SIFT is under OpenCV (httpsopencvorg)and other feature embedding methods are under scikit-image (httpscikit-imageorg) All the experiments wereconducted on a regular notebook PC (25-GHz 8-core CPU12-G RAM and Ubuntu 64-bit OS)

42 Vehicle Detection Experiment with YOLOv2 For theexperiment using original images the original training setand test set are used for training and testing for the exper-iment using sheared images we used the approach based ontrained YOLOv2 to detect the original training set and testset to get the sheared training set and sheared testing set fortraining and testing

To verify the importance of vehicle detection for vehicleclassification we designed two groups of vehicle classifica-tion experiments one using original images and the otherone using sheared images after vehicle detection then theC4M3F2model is used for vehicle classification experiments

We initialized the C4M3F2 model by truncated nor-mal distribution fitted the model on original training setand sheared training set for 2000 epochs respectively andrecorded the accuracy of our C4M3F2 model on differenttesting set the results are shown in Figure 5 As we expectedthe sheared images of vehicles more accurately representthe characteristics of vehicles while pruning more uselessinformation and facilitating the feature extraction and vehicleclassification As we can see in Figure 5 the accuracy ofC4M3F2 model using sheared data set is much better thanthe one using original data set in the previous training thecharacteristics of vehicles were extracted more accurately sothat the model quickly achieved a better classification resultsand a stable status

Finally the accuracy of C4M3F2 using sheared data set is9142 which is 453 higher than the 8689 of C4M3F2using original data set It can be concluded that the resultsof vehicle classification using sheared data set after vehicledetection based on YOLOv2 can be improved effectively

Table 2 Accuracy and FPS of different methods

Method Accuracy FPSHOG+SVM 6012 4DAISY+SVM 6904 2ORB+BoW+SVM 6407 7SIFT+BoW+SVM 7449 5DeCAF[1] 6620 13CNNs 9142 800

43 Compare Our Approach with Others There are manyother image classification methods To assess our classifi-cation model we compared our approach with other fivemethods

The five methods are based on the image features definedby the scholars in computer image processing Consideringthe comprehensive factors four kinds of image features anda convolutional method are selected they are histograms oforiented gradient (HOG) [42] DAISY [43] oriented FASTand rotated BRIEF (ORB)[44] scale-invariant feature trans-form (SIFT) [45] and DeCAF[1] respectively These methodsare excellent in target object detection in [1 42ndash45] HOGis based on computing and counting the gradient directionhistogram of local regions DAISY is a fast computing localimage feature descriptor for dense feature extraction andit is based on gradient orientation histograms similar tothe SIFT descriptor ORB uses an oriented FAST detectionmethod and the rotated BRIEF descriptors unlike BRIEFORB is comparatively scale and rotation invariant while stillemploying the very efficient Hamming distance metric formatching SIFT is the most widely used algorithm of keypoint detection and description It takes full advantage ofimage local information SIFT feature has a good effect inrotation scale and translation and it is robust to changes inangle of view and illumination these features are beneficialto the effective expression of targets information For HOGand DAISY image features regions are designed featuresare computed and sent into SVM classifier to be classifiedFor ORB and SIFT they do not have acquisition featuresregions and specified number of features we get the imagefeatures based on Bag-of-Words (BoW) model by treatingimage features as words In the first instance all featurespoints of all training build the visual vocabulary in the nextplace a feature vector of occurrence counts of the vocabularyis constructed from an image in the end the feature vectoris sent into SVM classifier to be classified DeCAF uses fiveconvolutional layers and two fully connected layers to extractfeatures and a SVM to classify the image into the right group[1]

Here we performed vehicle classification experiments onsheared data set Table 2 shows the accuracy and FPS of CNNsand other state-of-the-art methods in test other methods arevery slow because they take a lot of time to extract featuresIt can be observed that the results show the effectiveness ofCNNs on vehicle classification problem

From another point of view we demonstrated the clas-sification ability of each method by confusion matrix of the

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

6 Advances in Multimedia

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

OriginalSheared

02

03

04

05

06

Accu

racy 07

08

Figure 5 Comparison of training process between original andsheared

filled according to the nearest mode After configuration andtaking into account the balance of data we fitted it on ourdata and got 1400 samples for every categories on the trainingset and 600 samples for every categories on the testing set toassess our classification model

For vehicle classification the CNNs are under Tensorflowframework the SIFT is under OpenCV (httpsopencvorg)and other feature embedding methods are under scikit-image (httpscikit-imageorg) All the experiments wereconducted on a regular notebook PC (25-GHz 8-core CPU12-G RAM and Ubuntu 64-bit OS)

42 Vehicle Detection Experiment with YOLOv2 For theexperiment using original images the original training setand test set are used for training and testing for the exper-iment using sheared images we used the approach based ontrained YOLOv2 to detect the original training set and testset to get the sheared training set and sheared testing set fortraining and testing

To verify the importance of vehicle detection for vehicleclassification we designed two groups of vehicle classifica-tion experiments one using original images and the otherone using sheared images after vehicle detection then theC4M3F2model is used for vehicle classification experiments

We initialized the C4M3F2 model by truncated nor-mal distribution fitted the model on original training setand sheared training set for 2000 epochs respectively andrecorded the accuracy of our C4M3F2 model on differenttesting set the results are shown in Figure 5 As we expectedthe sheared images of vehicles more accurately representthe characteristics of vehicles while pruning more uselessinformation and facilitating the feature extraction and vehicleclassification As we can see in Figure 5 the accuracy ofC4M3F2 model using sheared data set is much better thanthe one using original data set in the previous training thecharacteristics of vehicles were extracted more accurately sothat the model quickly achieved a better classification resultsand a stable status

Finally the accuracy of C4M3F2 using sheared data set is9142 which is 453 higher than the 8689 of C4M3F2using original data set It can be concluded that the resultsof vehicle classification using sheared data set after vehicledetection based on YOLOv2 can be improved effectively

Table 2 Accuracy and FPS of different methods

Method Accuracy FPSHOG+SVM 6012 4DAISY+SVM 6904 2ORB+BoW+SVM 6407 7SIFT+BoW+SVM 7449 5DeCAF[1] 6620 13CNNs 9142 800

43 Compare Our Approach with Others There are manyother image classification methods To assess our classifi-cation model we compared our approach with other fivemethods

The five methods are based on the image features definedby the scholars in computer image processing Consideringthe comprehensive factors four kinds of image features anda convolutional method are selected they are histograms oforiented gradient (HOG) [42] DAISY [43] oriented FASTand rotated BRIEF (ORB)[44] scale-invariant feature trans-form (SIFT) [45] and DeCAF[1] respectively These methodsare excellent in target object detection in [1 42ndash45] HOGis based on computing and counting the gradient directionhistogram of local regions DAISY is a fast computing localimage feature descriptor for dense feature extraction andit is based on gradient orientation histograms similar tothe SIFT descriptor ORB uses an oriented FAST detectionmethod and the rotated BRIEF descriptors unlike BRIEFORB is comparatively scale and rotation invariant while stillemploying the very efficient Hamming distance metric formatching SIFT is the most widely used algorithm of keypoint detection and description It takes full advantage ofimage local information SIFT feature has a good effect inrotation scale and translation and it is robust to changes inangle of view and illumination these features are beneficialto the effective expression of targets information For HOGand DAISY image features regions are designed featuresare computed and sent into SVM classifier to be classifiedFor ORB and SIFT they do not have acquisition featuresregions and specified number of features we get the imagefeatures based on Bag-of-Words (BoW) model by treatingimage features as words In the first instance all featurespoints of all training build the visual vocabulary in the nextplace a feature vector of occurrence counts of the vocabularyis constructed from an image in the end the feature vectoris sent into SVM classifier to be classified DeCAF uses fiveconvolutional layers and two fully connected layers to extractfeatures and a SVM to classify the image into the right group[1]

Here we performed vehicle classification experiments onsheared data set Table 2 shows the accuracy and FPS of CNNsand other state-of-the-art methods in test other methods arevery slow because they take a lot of time to extract featuresIt can be observed that the results show the effectiveness ofCNNs on vehicle classification problem

From another point of view we demonstrated the clas-sification ability of each method by confusion matrix of the

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

Advances in Multimedia 7

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

HOG SIFT

DAISY DeCAF

ORB CNNs

Motorcycle

Transporte

r

Passen

gerOther

Motorcycle

Transporter

Passenger

Other

Motorcycle

Transporte

r

Passen

gerOther

08

07

06

05

04

03

02

01

07

06

05

04

03

02

01

07

06

05

04

03

02

01

08

07

06

05

04

03

02

01

00

08

06

04

02

08

06

04

02

Figure 6 Confusion matrix of classification of different methods

classification process from the five methods in Figure 6 Themain diagonal displays the high recognition accuracy Asshown in Figure 6 the top five comparative methods arebetter in recognition of motorcycle than other categoriesGenerally speaking ORB or SIFT combined with BoW andSVMmethod is a little better than the other two methods Alltaken into account our CNNs method is the best But theperformance of CNNs turns out not so satisfactory in viewthan the confusion of transporter and passenger

Next we will focus on the reason of confusion in CNNsAccording to the precision recall and f1-score of classifica-tion in Table 3 it shows that the identification of motorcycleis very good enough and the identification of transporter andpassenger is relatively poor

As shown in the examples in Figure 7 it can be seen thatthe wrongly recognized transporter and passenger imagesinclude vehicle face information mainly and very little vehiclebody information as far as themain vehicle face is concerned

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

8 Advances in Multimedia

Figure 7 Examples of wrongly recognized vehicles images Firstrow wrongly recognized as passenger which should be transporterSecond row wrongly recognized as transporter which should bepassenger

09

0 25 50 75 100 125 150 175 200

Epochs(times10)

WithoutPre-trained

02

03

04

05

06

Accu

racy 07

08

Figure 8 Comparison of training process between pretrained andwithout pretraining

Table 3 Classification precision recall and f1-score of CNNs

Type Precision Recall F1-scoreMotorcycle 097 095 096Transporter 087 085 086Passenger 090 091 090Other 093 093 093

these vehicles images are so similar in profile that it is still achallenge to recognize same images in Figure 7 manually

44 PretrainingApproachExperiment Weare eager for betterperformance of our classification model C4M3F2 Hereunsupervised pretraining has been used for optimization ofour classification model 17395 sheared vehicles images areobtained by shearing the unlabeled vehicles images

We pretrained the parameters of every convolutionallayer for 2000 epochs and then supervised training the modelon our sheared training set and tested it on our shearedtesting set the results is shown in Figure 8 In the processof training the conclusion is that the effect is more obviousin the previous epochs and the overall training process isstable relatively Ultimately the accuracy of pretrained CNNsis 9350 which is 208 higher than the 9142 of CNNswithout pretraining

Table 4 Classification precision recall and f1-score of pretrainedCNNs based on sheared dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 090 099 099Passenger 091 092 092Other 095 096 095

Table 5 Classification precision recall and f1-score of pretrainedCNNs based on original dataset

Type Precision Recall F1-scoreMotorcycle 099 099 099Transporter 079 082 081Passenger 084 079 081Other 090 094 092

Table 6 The classification accuracy under different strategies

Original ShearedCNNsWithout pre-training 8689 9142CNNs with pre-training 8829 935

By analyzing the classification performance of pretrainedCNNs shown in Table 4 we can draw a conclusion thatits performance is better than the one of CNNs withoutpretraining shown in Table 3 especially for the classificationof transporter and passenger In summary pretrained CNNis more effective in recognizing the vehicles categories and itis a state-of-the-art approach for vehicle classification

In the end to verify the effect of detection in the entiresystem we conducted ablation study by pretraining andtesting our model on the original dataset which has notbeen sheared by YOLOv2 and contains a large quantity ofirrelevant background Ultimately the accuracy of pretrainedCNNs on original dataset is 8829 which is 521 lowerthan the 935 of pretrained CNNs on sheared dataset andeven lower than the 9142 of CNNs without pretrainingon sheared dataset the classification performance is shownin Table 5 And according the classification accuracy inTable 6 we can conclude that this ablation study confirmsthe essentiality of detection virtually in the whole vehicleclassification system

5 Conclusions

A classification method based on CNNs has been detailedin this paper To improve the accuracy we used vehicledetection to removing the unrelated background for facili-tating the feature extraction and vehicle classification Thenan autoencoder-based layer-wise unsupervised pretraining isintroduced to improve the CNNs model by enhancing theclassification performance Several state-of-the-art methodshave been evaluated on our labeled data set containing fourcategories of motorcycle transporter passenger and others

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

Advances in Multimedia 9

Experimental results have demonstrated that the pretrainedCNNsmethod based on vehicle detection is themost effectivefor vehicle classification

In addition the success of our vehicle classification makesa vehicle color or logo recognition system possible in ourrefueling behavior analysis meanwhile it is a great help tourban computing intelligent transportation system etc

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This research is supported by Youth Innovation PromotionAssociation CAS (2015355) The authors gratefully acknowl-edge the invaluable contribution of Yupeng Ma and themembers of his laboratory during this collaboration

References

[1] DHe C Lang S Feng XDu andC Zhang ldquoVehicle detectionand classification based on convolutional neural networkrdquo inProceedings of the 7th International Conference on InternetMultimedia Computing and Service 2015

[2] J F Forren and D Jaarsma ldquoTraffic monitoring by tire noiserdquoComputer Standards amp Interfaces vol 20 pp 466-467 1999

[3] J George A Cyril B I Koshy and L Mary ldquoExploring SoundSignature for VehicleDetection and Classification Using ANNrdquoInternational Journal on Soft Computing vol 4 no 2 pp 29ndash362013

[4] J George L Mary and K S Riyas ldquoVehicle detection andclassification from acoustic signal using ANN and KNNrdquo inProceedings of the 2013 International Conference on ControlCommunication and Computing (ICCC rsquo13) pp 436ndash439Thiruvananthapuram India 2013

[5] KWang R Wang Y Feng et al ldquoVehicle recognition in acous-tic sensor networks via sparse representationrdquo in Proceedings ofthe 2014 IEEE International Conference onMultimedia and ExpoWorkshops (ICMEW rsquo14) pp 1ndash4 Chengdu China 2014

[6] A Duzdar and G Kompa ldquoApplications using a low-cost base-band pulsed microwave radar sensorrdquo in Proceedings of the 18thIEEE Instrumentation andMeasurement Technology Conference(IMTC rsquo01) vol 1 of Rediscovering Measurement in the Age ofInformatics pp 239ndash243 IEEE Budapest Hungary 2001

[7] H-T Kim and B Song ldquoVehicle recognition based on radarand vision sensor fusion for automatic emergency brakingrdquoin Proceedings of the 13th International Conference on ControlAutomation and Systems (ICCAS rsquo13) pp 1342ndash1346 GwangjuRepublic of Korea 2013

[8] Y Jo and I Jung ldquoAnalysis of vehicle detection with wsn-basedultrasonic sensorsrdquo Sensors vol 14 no 8 pp 14050ndash14069 2014

[9] Y Iwasaki MMisumi and T Nakamiya ldquoRobust vehicle detec-tion under various environments to realize road traffic flow

surveillance using an infrared thermal camerardquo The ScientificWorld Journal vol 2015 Article ID 947272 2015

[10] J Lan Y Xiang L Wang and Y Shi ldquoVehicle detection andclassification by measuring and processing magnetic signalrdquoMeasurement vol 44 no 1 pp 174ndash180 2011

[11] B Li T Zhang and T Xia ldquoVehicle Detection from 3DLidar Using Fully Convolutional Networkrdquo httpsarxivorgabs160807916 2016

[12] V Kastrinaki M Zervakis and K Kalaitzakis ldquoA survey ofvideo processing techniques for traffic applicationsrdquo Image andVision Computing vol 21 no 4 pp 359ndash381 2003

[13] F M Kazemi S Samadi H R Poorreza and M-R Akbar-zadeh-T ldquoVehicle recognition based on fourier wavelet andcurvelet transforms -A comparative studyrdquo inProceedings of the4th International Conference on Information Technology-NewGenerations ITNG rsquo07 pp 939-940 Las Vegas Nev USA 2007

[14] J Y Ng and Y H Tay ldquoImage-based Vehicle ClassificationSystemrdquo httpsarxivorgabs12042114 2012

[15] R A Hadi G Sulong and L E George ldquoVehicle Detectionand Tracking Techniques A Concise Reviewrdquo Signal amp ImageProcessing An International Journal vol 5 no 1 pp 1ndash12 2014

[16] J Arrospide and L Salgado ldquoA study of feature combinationfor vehicle detection based on image processingrdquoThe ScientificWorld Journal vol 2014 Article ID 196251 13 pages 2014

[17] P Piyush R Rajan L Mary and B I Koshy ldquoVehicle detectionand classification using audio-visual cuesrdquo in Proceedings of the3rd International Conference on Signal Processing and IntegratedNetworks (SPIN rsquo16) pp 726ndash730 Noida India 2016

[18] Z Chen T Ellis and S A Velastin ldquoVehicle detection trackingand classification in urban trafficrdquo in Proceedings of the 15thInternational IEEE Conference on Intelligent TransportationSystems ITSC rsquo12 pp 951ndash956 Anchorage Alaska USA 2012

[19] X Wen L Shao Y Xue and W Fang ldquoA rapid learning algo-rithm for vehicle classificationrdquo Information Sciences vol 295pp 395ndash406 2015

[20] P Mishra and B Banerjee ldquoMultiple Kernel based KNNClassifiers for Vehicle Classificationrdquo International Journal ofComputer Applications vol 71 no 6 pp 1ndash7 2013

[21] A Tourani and A Shahbahrami ldquoVehicle counting methodbased on digital image processing algorithmsrdquo in Proceedingsof the 2nd International Conference on Pattern Recognition andImage Analysis IPRIA rsquo15 pp 1ndash6 Rasht Iran 2015

[22] H Wang Y Cai and L Chen ldquoA vehicle detection algorithmbased on deep belief networkrdquoThe Scientific World Journal vol2014 Article ID 647380 7 pages 2014

[23] M Yi F Yang E Blashchet al ldquoVehicleClassification inWAMIImagery using Deep Networkrdquo in Proceedings of the SPIE 9838Sensors and Systems for Space Applicatioins IX 2016

[24] B Li T Zhang and T Xia ldquoVehicle detection from 3Dlidar using fully convolutional networkrdquo in Proceedings of theRobotics Science and Systems 2016

[25] Y Lecun B Boser J S Denker et al ldquoBackpropagation appliedto handwritten zip code recognitionrdquo Neural Computation vol1 no 4 pp 541ndash551 1989

[26] Y Lecun L Bottou Y Bengio and P Haffner ldquoGradient-basedlearning applied to document recognitionrdquo Proceedings of theIEEE vol 86 no 11 pp 2278ndash2323 1998

[27] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo NeuralInformation Processing Systems pp 1097ndash1105 2012

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

10 Advances in Multimedia

[28] P Sermanet D Eigen X Zhang M Mathieu R Fergus andY Lecun ldquoOverFeat Integrated Recognition Localization andDetection using Convolutional Networksrdquo httpsarxivorgabs13126229 2013

[29] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA 2014

[30] R Girshick ldquoFast R-CNNrdquo in Proceedings of the 15th IEEEInternational Conference on Computer Vision (ICCV rsquo15) pp1440ndash1448 Santiago Chile 2015

[31] J Redmon S Divvala R Girshick and A Farhadi ldquoYou onlylook once Unified real-time object detectionrdquo in Proceedingsof the 2016 IEEE Conference on Computer Vision and PatternRecognition (CVPR rsquo16) pp 779ndash788 Las Vegas Nev USA2016

[32] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[33] W Liu D Anguelov D Erhan et al ldquoSSD single shot multiboxdetectorrdquo in Proceedings of the Computer Vision ndash ECCV 2016vol 9905 of Lecture Notes in Computer Science pp 21ndash37 2016

[34] J Dai Y Li K He and J Sun ldquoR-FCN Object Detectionvia Region-based Fully Convolutional Networksrdquohttpsarxivorgabs160506409 2016

[35] J Redmon andA Farhadi ldquoYOLO9000 Better faster strongerrdquoin Proceedings of the 30th IEEE Conference on Computer Visionand Pattern Recognition (CVPR rsquo17) pp 6517ndash6525 HonoluluHawaii USA 2017

[36] D Erhan Y Bengio A Courville P-A Manzagol P Vincentand S Bengio ldquoWhy does unsupervised pre-training help deeplearningrdquo Journal of Machine Learning Research vol 11 pp625ndash660 2010

[37] G E Hinton S Osindero and Y Teh ldquoA fast learning algorithmfor deep belief netsrdquoNeural Computation vol 18 no 7 pp 1527ndash1554 2006

[38] Y Bengio P Lamblin D Popovici and H Larochelle ldquoGreedylayer-wise training of deep networksrdquo Neural Information Pro-cessing Systems pp 153ndash160 2007

[39] J Masci U Meier D Ciresan and J Schmidhuber ldquoStackedConvolutional Auto-Encoders for Hierarchical Feature Extrac-tionrdquo in Proceedings of the Artificial Neural Networks andMachine Learning - ICANN 2011 vol 6791 of Lecture Notesin Computer Science pp 52ndash59 Springer Berlin HeidelbergGermany 2011

[40] G E Hinton and R S Zemel ldquoAutoencoders minimum de-scription length and helmholtz free energyrdquoNeural InformationProcessing Systems pp 3ndash10 1994

[41] M D Zeiler D Krishnan GW Taylor and R Fergus ldquoDecon-volutional networksrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPRrsquo 10) pp 2528ndash2535 San Francisco Calif USA 2010

[42] N Dalal and B Triggs ldquoHistograms of oriented gradients forhuman detectionrdquo in Proceedings of the IEEE Computer SocietyConference on Computer Vision and Pattern Recognition (CVPRrsquo05) vol 1 pp 886ndash893 2005

[43] E Tola V Lepetit and P Fua ldquoDAISY an efficient densedescriptor applied to wide-baseline stereordquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 32 no 5 pp815ndash830 2010

[44] E Rublee V Rabaud K Konolige and G Bradski ldquoORB anefficient alternative to SIFT or SURFrdquo in Proceedings of the IEEEInternational Conference on Computer Vision (ICCV rsquo11) pp2564ndash2571 Barcelona Spain 2011

[45] D G Lowe ldquoObject recognition from local scale-invariantfeaturesrdquo inProceedings of the 7th IEEE International Conferenceon Computer Vision (ICCV rsquo99) vol 2 pp 1150ndash1157 KerkyraGreece 1999

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Pretraining Convolutional Neural Networks for Image-Based ...downloads.hindawi.com/journals/am/2018/3138278.pdf · AdvancesinMultimedia 0.9 0 25 50 75 100 125 150 175 200 Epochs(×10)

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom