research article a vehicle detection algorithm based on...

Research ArticleA Vehicle Detection Algorithm Based on Deep Belief Network

Hai Wang1 Yingfeng Cai2 and Long Chen2

1 School of Automotive and Traffic Engineering Jiangsu University Zhenjiang 212013 China2 Automotive Engineering Research Institute Jiangsu University Zhenjiang 212013 China

Correspondence should be addressed to Yingfeng Cai caicaixiao0304126com

Received 25 March 2014 Accepted 22 April 2014 Published 15 May 2014

Academic Editor Yu-Bo Yuan

Copyright copy 2014 Hai Wang et alThis is an open access article distributed under theCreativeCommonsAttributionLicensewhichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Vision based vehicle detection is a critical technology that plays an important role in not only vehicle active safety but also roadvideo surveillance application Traditional shallow model based vehicle detection algorithm still cannot meet the requirement ofaccurate vehicle detection in these applications In this work a novel deep learning based vehicle detection algorithm with 2Ddeep belief network (2D-DBN) is proposed In the algorithm the proposed 2D-DBN architecture uses second-order planes insteadof first-order vector as input and uses bilinear projection for retaining discriminative information so as to determine the size ofthe deep architecture which enhances the success rate of vehicle detection On-road experimental results demonstrate that thealgorithm performs better than state-of-the-art vehicle detection algorithm in testing data sets

1 Introduction

Robust vision based vehicle detection on the road is tosome extent a challenging problem since highways and urbanand city roads are dynamic environment in which thebackground and illuminations are dynamic and time variantBesides the shape color size and appearance of vehicles areof high variability To make this task even more difficult theego vehicle and target vehicles are generally in motion so thatthe size and location of target vehicles mapped to the imageare diverse

Although deep learning for object recognition has beenan area of great interest in the machine-learning communityno prior research study has been reported that uses deeplearning to establish an on-road vehicle detection methodIn this paper a 2D-DBN based vehicle detection algorithmis proposed

The main novelty and contribution of this work includethe following (1) A deep learning architecture of 2D-DBN which preserves discriminative information for vehicledetection is proposed (2) A deep learning based on-roadvehicle detection systemhas been implemented and thoroughquantitative performance analysis has been presented

The rest of this paper is organized as follows Section 2will give a brief talking about vision based vehicle detection

tasks and deep learning for object recognition Section 3introduces in detail the proposed 2D-DBN architecture andtraining methods for the vehicle detection tasks The experi-ments and analysis will be given in Section 4 and Section 5 isthe conclusion

2 Related Research

In this section a brief overview of two categories of work thatis relevant to our research is introduced The first set is aboutvision based vehicle detection and the second focuses on deeplearning for object recognition

21 Vision Based Vehicle Detection Since only monocularvisual perception is used in our project this section willmainly refer to studies using monocular vision for on-roadvehicle detection

For monocular vision based vehicle detection usingvehicle appearance characteristics is the most common andeffective approach A variety of appearance features have beenused in the field to detect vehicles Some typical image fea-tures representing intuitive vehicle appearance informationsuch as local symmetry edge and cast shadow have beenused by many earlier works

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 647380 7 pageshttpdxdoiorg1011552014647380

2 The Scientific World Journal

Hidden layer

Hidden layer

Hidden layer

Input layer

RBM

RBM

RBM

x

y

H1

H2

H1

V1 V1V1

HN

HNminus1

H1

Figure 1 Architecture of deep belief network (DBN)

In recent years there has been a transition from sim-pler image features to general and robust feature sets forvehicle detection These feature sets now common in thecomputer vision literature allow for direct classification anddetection of objects in images For vehicle detection purposehistogram of oriented gradient (HOG) features and Haar-like features are extremely well represented in literatureBesides Gabor features scale-invariant feature transform(SIFT) features speeded up robust features (SURF) andsome combined features are also applied for vehicle imagerepresentation

Classification methods for appearance-based vehicledetection have also followed the general trends in thecomputer vision and machine learning literature Comparedto generative classifiers discriminative classifiers whichlearn a decision boundary between two classes (vehiclesand unvehicles) have been more widely used in vehicledetection application Support vector machines (SVM) andAdaboost are the most two common classifiers that are usedfor training vehicle detector In [1] SVM classification wasused to classify Haar feature vectors The combination ofHOG features and SVM classification has also been used[2ndash4] Adaboost [5] has also been widely used for classifi-cation for Viola and Jonesrsquo contribution The combinationof Haar-like feature extraction and Adaboost classificationhas been used to detect rear faces of vehicles in [6ndash9]Artificial neural network classifiers were also used for vehi-cle detection but the training often failed due to localoptimum [10]

22 Deep Learning for Object Recognition Classifiers such asSVM and Adaboost referred to in last section are all indeed ashallow learning model because they both can be modeled asstructure with one input layer one hidden layer and one out-put layer Deep learning refers to a class of machine learningtechniques where hierarchical architectures are exploited forrepresentation learning and pattern classification Differentfrom those shallow models deep learning has the ability oflearningmultiple levels of representation and abstraction thathelps to make sense of image data From another point ofview deep learning can be viewed as one kind of multilayer

neural networks with adding a novel unsurprised pretrainingprocess

There are various subclasses of deep architecture Deepbelief networks (DBN) modal is a typical deep learningstructure which is first proposed by Hinton et al [11] Theoriginal DBN has demonstrated its success in simple imageclassification tasks of MNIST In [12] a modified DBN isdeveloped in which a Boltzmann machine is used on the toplayer This modified DBN is used in a 3D object recognitiontask

Deep convolutional neural network (DCNN) with theability to preserve the space structure and resistance to smallvariations in the images is used in image classification [13]Recently DCNN achieved the best performance compared toother state-of-the-art methods in the 2012 ImageNet LSVRCcontest containing 12 million images with more than 1000different classes In this DCNN application a very largearchitecture is built withmore than 600000 neurons and over60 million weights

DBN is a probabilistic model composed ofmultiple layersof stochastic hidden variables The learning procedure ofDBN can be divided into two stages generative learning toabstract information layer by layer with unlabelled samplesfirstly and then discriminative learning to fine-tune thewholedeep network with labeled samples to the ultimate learningtarget [11] Figure 1 shows a typical DBN with one input layer1198811and N hidden layers 119867

1 1198672 119867

119873 while x is the input

data which can be for example a vector and y is the learningtarget for example class labels In the unsupervised stage ofDBN training processes each pair of layers grouped togetherto reconstruct the input of the layer from the output InFigure 1 the layer-wise reconstruction happens between 119881

1

and 1198671 1198671and 119867

2 119867

119873minus1and 119867

119899 respectively which is

implemented by a family of restricted Boltzmann machines(RBMs) [14] After the greedy unsupervised learning of eachpair of layers the features are progressively combined fromloose low-level representations into more compact high-level representations In the supervised stage the wholedeep network is then refined using a contrastive versionof the ldquowake-sleeprdquo algorithm via a global gradient-basedoptimization strategy

The Scientific World Journal 3

3 Deep Learning Based Vehicle Detection

In this section a novel algorithm based on deep beliefnetwork (DBN) is proposed Traditional DBN for objectclassification has some shortages First the training samplesare regularized to first-order vector which will lead tothe missing of spatial information contained by the imagesamples This will obviously lead to a decline in the detectionrate in vehicle detection tasks Secondly the size of layers(such as node number and layer number) in the traditionalDBN is manually set which is often big and will lead tostructural redundancy and increase the training and decisiontime of the classifier while for vehicle detection algorithmwhich is usually used in real-time application decision timeis a critical factor The proposed 2D-DBN architecture forvehicle detection uses second-order planes instead of first-order vector of 1D-DBN as input and uses bilinear projectionfor retaining discriminative information so as to determinethe size of the deep architectureThe bilinear projectionmapsoriginal second-order output of lower layer to a small bilinearspace without reducing discriminative information And thesize of the upper layer is that of the bilinear space

In Section 31 the overall architecture of our 2D-DBNfor vehicle detection will be introduced In Section 32 thebilinear projectionmethod of lower layer outputwill be givenIn Sections 33 and 34 the training method of the whole 2D-DBN for vehicle detection will be deduced

31 2119863 Deep Belief Network (2D-DBN) for Vehicle DetectionLet 119883 be the set of data samples including vehicle imagesand nonvehicle images assuming that 119883 is consisting withK samples which is shown below

119883 = [X1X2 X

119896 X

119870] (1)

In 119883 X119896is training samples and in the image space R119868times119869

Meanwhile 119884 means the labels corresponding to 119883 whichcan be written as

119884 = [1199101 1199102 119910

119896 119910

119870] (2)

In 119884 119910119896is the label vector of X

119896 If X119896belongs to vehicles

119910119896= (1 0) On the contrary 119910

119896= (0 1)

The ultimate purpose in vehicle detection task is to learna mapping function from training data 119883 to the label data119884 based on the given training set so that this mappingfunction is able to classify unknown images between vehicleand nonvehicle

Based on the task described above a novel 2D deep beliefnetwork (2D-DBN) is proposed to address this problemFigure 2 shows the overall architecture of 2D-DBN A fullyinterconnected directed belief network includes one visibleinput layer 1198811 119873 hidden layers 1198671 119867119873 and one visiblelabel layer La at the top The visible input layer 1198811 maintains119872times119873 neural and equal to the dimension of training featurewhich is the original 2D image pixel values of training sam-ples in this application Sincemaximumdiscriminative abilitywishes to be preserved from layer to layer with nonredundantlayer size in this application for real-time requirement the

y1

ky2

k

La

HN

HNminus1

H1

V1

Xk

Visible layer

Feature input

Label layer

Hidden layer

Figure 2 Proposed 2D-DBN for vehicle detection

size of119873 hidden layers is dynamically decided with so-calledbilinear projection In the top the La layer just has two unitswhich is equal to the classes this application would like toclassify Till now the problem is formulated to search for theoptimum parameter space 120579 of this 2D-DBN

The main learning process of the proposed 2D-DBN hasthree steps

(1) The bilinear projection is utilized to map the lowerlayer output data onto subspace and optimized tooptimum dimension as well as to retain discrimina-tive information The size of the upper layer will bedetermined by this optimum dimension

(2) When the size of the upper layer is determined theparameters of the two adjacent layers will be refinedwith the greedy-wise reconstruction method Repeatstep (1) and step (2) till all the parameters of hiddenlayers are fixedHere step (1) and step (2) are so calledpretraining process

(3) Finally the whole 2D-DBN will be fine-tuned withthe La layer information based on back propagationHere step (3) can be viewed as supervised trainingstep

32 Bilinear Projection for Upper Layer Size DeterminationIn this section followed with Zhongrsquos contribution [15]bilinear projection is used in order to determine the size ofevery upper layer in adjacent layer groups As mentioned inSection 31 with the labeled training data X

119896isin R119868times119869 as the

output of the visible layer 1198811 bilinear projection maps the


original data X119896onto a subspace and is represented by its

latent form LX119896 The bilinear projection is written as follows

LX119896= U119879X

119896V 119896 = 1 2 119870 (3)

Here U isin 119877119872times119875 and V isin 119877119873times119876 are projection matricesthat map the original data X

119896by its latent form LX

119896with the

constraint that U119879U = I and V119879V = IHow to determine the value of U and V so that the

discriminative information ofX119896can be preserved is the issue

that needs to be solved For this a specific objective functionis built as follows

argmaxUV

119869 (UV) = sum119894119895

10038171003817100381710038171003817U119879 (X

119894minus X119895)V10038171003817100381710038171003817

2

times (120572119861119894119895minus (1 minus 120572)119882

119894119895)

(4)

in which 119861119894119895is between-class weights 119882

119894119895is within class

weights and 120572 isin [0 1] is the balance parameter 119861119894119895and 119882

119894119895

are calculated as follows [16]

119861119894119895=

minus119899119899119888

(119899119899119888

+ 119899119888) 119899119888

if 119910119888119894= 119910119888119895= 1

1

119899119899119888

+ 119899119888

else

119882119894119895=

1

119899119888

if 119910119888119894= 119910119888119895= 1

0 else

(5)

Here 119910119888119894is the class label of sample data X

119894 which is either

(1 0) or (0 1) 119899119888is the number of samples that belong to

class 119888 and 119899119888is those not belonging to class 119888 Since vehicle

detection is a binary classification problem 119888 isin 1 2 in thisapplication

It can be seen that the purpose of the objective functionis to simultaneously maximize the between-class distancesand minimize the within-class distances In other words theobjective function focuses on maximizing the discriminativeinformation of all the sample data However optimizing119869(UV) is a nonconvex optimization problem with twomatrices U and V To deal with this trouble a strategy calledalternative fixing (AF) is used which is fixing U (or V) andoptimizing the objective function 119869(UV) with just variablematrix V (or U) and then fixing V (or U) and optimizing119869(UV) with just U (or V) AF will be implemented alterna-tively till 119869(UV) reaches its upper bound

After the optimum process new Ulowast and Vlowast that maxi-mize 119869(UV) are got and preserve the discriminative informa-tion of original sample data119883 Based on this then the size ofthe upper layer can be determined by the number of positiveeigenvalues of Ulowast and Vlowast which is 119875 and 119876 respectively

33 Pretraining with Greedy Layer-Wise ReconstructionMethod In last subsection the size of the upper layer isdetermined to be 119875times119876 In this subsection the parameters ofthe two adjacent layers will be refined with the greedy-wisereconstruction method proposed by Hinton et al [11] To

illustrate this pretraining process we take the visible inputlayer 1198811 and the first hidden layer 1198671 for example

The visible input layer 1198811 and the first hidden layer 1198671

contract a restrict Boltzmann machine (RBM) 119868 times 119869 is theneural number in 1198811 and 119875 times 119876 is that of 1198671 The energy ofthe state (V1ℎ1) in this RBM is

119864 (k1 h1 1205791) = minus (k1Ah1 + b1k1 + c1h1)

= minus

119894le119868119895le119869

sum119894=1119895=1

119901le119875119902le119876

sum119901=1119902=1

V11198941198951198601

119894119895119901119902ℎ1

119901119902

minus

119894le119868119895le119869

sum119894=1119895=1

1198871

119894119895V1119894119895minus

119901le119875119902le119876

sum119901=1119902=1

1198881

119901119902ℎ1

119901119902

(6)

in which 1205791 = (A1 b1 c1) are the parameters between thevisible input layer 1198811 and the first hidden layer 1198671 1198601

119894119895119901119902

is the symmetric weights from input neural (119894 119895) in 1198811 tothe hidden neural (119901 119902) in 119867

1 1198871119894119895and 1198881

119901119902are the (119894 119895)

119905ℎ and(119901 119902)119905ℎ bias of 1198811 and 1198671 So this RBM is with the joint

distribution as follows

119875 (k1 h1 1205791) =1

119885119890minus119864(k1 h11205791)

=119890minus119864(k

1h11205791)

sumV1 sumℎ1 119890minus119864(k1 h11205791)

(7)

Here Z is the normalization parameter and the probabilitythat k1 is assigned to 1198811 of this modal is

119875 (k1) =1

119885sum

ℎ1

119890minus119864(k1 h1 1205791)

=sumℎ1 119890minus119864(k1 h11205791)

sumV1 sumℎ1 119890minus119864(k1 h11205791) (8)

After that the conditional distributions over visible inputstate k1 in layer 1198811 and hidden state ℎ1 in 1198671 are able to begiven by the logistic function respectively

119901 (h1 | k1) = prod119901119902

119901 (ℎ1

119901119902| k1) 119901 (ℎ

1

119901119902| k1)

= 120590(

119894le119868119895le119869

sum119894=1119895=1

V11198941198951198601

119894119895119901119902+ 1198881

119901119902)

(9)

119901 (k1 | h1) = prod119894119895

119901 (k1119894119895| h1) 119901 (k1

119894119895| h1)

= 120590(

119901le119875119902le119876

sum119901=1119902=1

ℎ1

1199011199021198601

119894119895119901119902+ 1198871

119894119895)

(10)

Here 120590(119909) = 1(1 + exp(minus119909))At last the weights and biases are able to be updated

step by step from random Gaussian distribution values


(a) (b)

Figure 3 Some positive and negative training samples (a) Positive samples (b) Negative samples

1198601119894119895119901119902

(0) 1198871119894119895(0) and 1198881

119901119902(0)with Contrastive Divergence algo-

rithm [17] and the updating formulations are

1198601

119894119895119901119902= 1205991198601

119894119895119901119902

+ 120576119860(⟨V1119894119895(0) ℎ1

119894119895(0)⟩

dataminus ⟨V1119894119895(119905) ℎ1

119894119895(119905)⟩

recon)

1198871

119894119895= 1205991198871

119894119895+ 120576119887(V1119894119895(0) minus V1

119894119895(119905))

1198881

119901119902= 1205991198881

119901119902+ 120576119888(ℎ1

119901119902(0) minus ℎ

1

119901119902(119905))

(11)

in which ⟨sdot⟩data means the expectation with respect tothe data distribution and ⟨sdot⟩recon means the reconstructiondistribution after one step Meanwhile 119905 is step size which isset to 119905 = 1 typically

Above the pretraining process is demonstrated by takingthe visible input layer 119881

1 and the first hidden layer 1198671

for example Indeed the whole pretraining process will betaken from low layer groups (1198811 1198671) to up layer groups(119867119899minus1 119867119899) one by one

34 Global Fine-Tuning In the above unsurprised pretrain-ing process the greedy layer-wise algorithm is used tolearn the 2D-DBN parameters with the information addedfrom bilinear projection In this subsection a traditionalback propagation algorithm will be used to fine-tune theparameters 120579 = [A b c] with the information of label layerLa

Since good parameters initiation has been maintained inthe pretraining process back propagation is just utilized tofinely adjust the parameters so that local optimumparameters120579lowast = [Alowast blowast clowast] can be got In this stage the learning objec-tion is to minimize the classification error [minussum

119905y119905log y119905]

where y119905and y119905are the real label and output label of data X

119905

in layer 119873

4 Experiment and Analysis

This section will take experiments on vehicle datasets todemonstrate the performance of the proposed 2D-DBNThedatasets for training are Caltech1999 database which includesimages containing 126 rear view vehicles Besides another

Table 1 Detection results of three different architectures of 2D-DBN

Classifier types Correct labeling Correct rate2D-DBN (1H) 689735 93742D-DBN (2H) 706735 96052D-DBN (3H) 695735 9456

600 vehicles in images are collected by our groups in recordedroad videos for training Meanwhile the negative samplesare chosen from 500 images not containing vehicles and thenumber of negative samples for training is 5000 Figure 3shows some of these positive and negative training samplesThe testing datasets are recorded road videoswith 735manualmarked vehicles

By using the proposed method three different architec-tures of 2D-DBN are applied They all contain one visiblelayer and one label layer but with one two and three hiddenlayers respectively In training the critical parameters of theproposed 2D-DBN in experiments are set as 120572 = 05 and120599 = 08 and image samples for training are all resized to32 times 32

The detection results of these three architectures of 2D-DBN are shown in Table 1 It can be seen that 2D-DBN withtwo hidden layers maintains the highest detection rate

The learned weights of hidden layers are shownin Figure 4

Then we compared the performance of our 2D-DBNwithmany other state-of-the-art classifiers including supportvector machine (SVM) k-nearest neighbor (KNN) neuralnetworks 1D-DBN and deep convoluted neural network(DCNN)

The detection results of these methods are shown inTable 2

From the compared results it can be concluded thatclassification methods with deep architecture for example1D-DBN DCNN and 2D-DBN are significantly better thanthose of shallow architecture for example SVM KNN andNN Moreover our proposed 2D-DBN is better than 1D-DBN and DCNN due to 2D feature input and the bilinearprojection

Finally this 2D-DBN vehicle detectionmethod is utilizedon road vehicle detection system and some of the vehicle


Learned weights on DBN layer 1

(a)


(b)

Figure 4 Learned weights of first hidden layer and second hidden layer on 2D-DBN (a) weights of first hidden layer and (b) weights ofsecond hidden layer

Table 2 Detection results of multiple methods

Classifier types Correct labeling Correct rateSVM 658735 8952KNN 642735 8735NN 619735 84211D-DBN 684735 9306DCNN 697735 94832D-DBN (2H) 706735 9605

sensing results in real road situation are shown in Figure 5The four rows of images are picked in daylight highwayraining day highway daylight urban and night highway withroad lamp respectively The solid green box means detectedvehicles and the dotted red boxmeans undetected vehicles orfalse detected vehiclesThe average vehicle detection time forone frame with 640 times 480 resolution is around 53ms in ourAdvantech industrial computer

Overall most of the on-road vehicles can be sensed suc-cessfully while misdetection and false detection sometimesoccurred during adverse situations such as partial occlusionand bad weather

5 Conclusion

In thiswork a novel vehicle detection algorithmbased on 2D-DBN is proposed In the algorithm the proposed 2D-DBNarchitecture uses second-order planes instead of first-ordervector as input and uses bilinear projection for retainingdiscriminative information so as to determine the size of thedeep architecture which enhances the success rate of vehicledetectionOn-road experimental results demonstrate that thesystem works well in different roads weather and lightingconditions

The future work of our researchwill focus on the situationwhen a vehicle is partially occluded with deep architectureframework

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Figure 5 Some of the real road vehicle sensing results Firstrow daylight highway situation second row raining day highwaysituation third row daylight urban situation forth row nighthighway with road lamp

Acknowledgments

This research was funded partly and was supported inpart by the National Natural Science Foundation of Chinaunder Grant no 51305167 Information Technology ResearchProgram of Transport Ministry of China under Grant no2013364836900 and Jiangsu University Scientific ResearchFoundation for Senior Professionals under Grant no12JDG010

References

[1] W Liu X Wen B Duan H Yuan and N Wang ldquoRear vehicledetection and tracking for lane change assistrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium (IV rsquo07) pp 252ndash257June 2007

[2] R Miller Z Sun and G Bebis ldquoMonocular precrash vehicledetection features and classifiersrdquo IEEE Transactions on ImageProcessing vol 15 no 7 pp 2019ndash2034 2006


[3] S Sivaraman and M M Trivedi ldquoActive learning for on-roadvehicle detection a comparative studyrdquo Machine Vision andApplications pp 1ndash13 2011

[4] S Teoh and T Brunl ldquoSymmetry-based monocular vehicledetection systemrdquoMachine Vision and Applications vol 23 pp831ndash842 2012

[5] R Sindoori K Ravichandran and B Santhi ldquoAdaboost tech-nique for vehicle detection in aerial surveillancerdquo InternationalJournal of Engineering amp Technology vol 5 no 2 2013

[6] J Cui F Liu Z Li and Z Jia ldquoVehicle localisation using asingle camerardquo in Proceedings of the IEEE Intelligent VehiclesSymposium (IV rsquo10) pp 871ndash876 June 2010

[7] T T Son and S Mita ldquoCar detection using multi-featureselection for varying posesrdquo inProceedings of the IEEE IntelligentVehicles Symposium pp 507ndash512 June 2009

[8] D Acunzo Y Zhu B Xie and G Baratoff ldquoContext-adaptiveapproach for vehicle detection under varying lighting condi-tionsrdquo in Proceedings of the 10th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo07) pp 654ndash660October 2007

[9] C T Lin S C Hsu J F Lee et al ldquoBoosted vehicle detectionusing local and global featuresrdquo Journal of Signal amp InformationProcessing vol 4 no 3 2013

[10] O Ludwig Jr and U Nunes ldquoImproving the generalizationproperties of neural networks an application to vehicle detec-tionrdquo in Proceedings of the 11th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo08) pp 310ndash315December 2008

[11] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[12] V Nair and G E Hinton ldquo3D object recognition with deepbelief netsrdquo in Proceedings of the 23rd Annual Conference onNeural Information Processing Systems (NIPS rsquo09) pp 1339ndash1347December 2009

[13] G Taylor R Fergus Y L Cun and C Bregler ldquoConvolutionallearning of spatio-temporal featuresrdquo in Computer VisionmdashECCV 2010 vol 6316 of Lecture Notes in Computer Science pp140ndash153 2010

[14] C X Zhang J S Zhang N N Ji et al ldquoLearning ensemble clas-sifiers via restricted Boltzmann machinesrdquo Pattern RecognitionLetters vol 36 pp 161ndash170 2014

[15] S-H Zhong Y Liu and Y Liu ldquoBilinear deep learning forimage classificationrdquo in Proceedings of the 19th ACM Interna-tional Conference on Multimedia ACM Multimedia (SIG MMrsquo11) December 2011

[16] S YanD Xu B ZhangH-J ZhangQ Yang and S Lin ldquoGraphembedding and extensions a general framework for dimen-sionality reductionrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 29 no 1 pp 40ndash51 2007

[17] F Wood and G E Hinton ldquoTraining products of experts byminimizing contrastive divergencerdquo Tech Rep 2012 53 143Brown University 2012

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



Hidden layer

Hidden layer

Hidden layer

Input layer

RBM

RBM

RBM

x

y

H1

H2

H1

V1 V1V1

HN

HNminus1

H1

Figure 1 Architecture of deep belief network (DBN)

In recent years there has been a transition from sim-pler image features to general and robust feature sets forvehicle detection These feature sets now common in thecomputer vision literature allow for direct classification anddetection of objects in images For vehicle detection purposehistogram of oriented gradient (HOG) features and Haar-like features are extremely well represented in literatureBesides Gabor features scale-invariant feature transform(SIFT) features speeded up robust features (SURF) andsome combined features are also applied for vehicle imagerepresentation

Classification methods for appearance-based vehicledetection have also followed the general trends in thecomputer vision and machine learning literature Comparedto generative classifiers discriminative classifiers whichlearn a decision boundary between two classes (vehiclesand unvehicles) have been more widely used in vehicledetection application Support vector machines (SVM) andAdaboost are the most two common classifiers that are usedfor training vehicle detector In [1] SVM classification wasused to classify Haar feature vectors The combination ofHOG features and SVM classification has also been used[2ndash4] Adaboost [5] has also been widely used for classifi-cation for Viola and Jonesrsquo contribution The combinationof Haar-like feature extraction and Adaboost classificationhas been used to detect rear faces of vehicles in [6ndash9]Artificial neural network classifiers were also used for vehi-cle detection but the training often failed due to localoptimum [10]

22 Deep Learning for Object Recognition Classifiers such asSVM and Adaboost referred to in last section are all indeed ashallow learning model because they both can be modeled asstructure with one input layer one hidden layer and one out-put layer Deep learning refers to a class of machine learningtechniques where hierarchical architectures are exploited forrepresentation learning and pattern classification Differentfrom those shallow models deep learning has the ability oflearningmultiple levels of representation and abstraction thathelps to make sense of image data From another point ofview deep learning can be viewed as one kind of multilayer

neural networks with adding a novel unsurprised pretrainingprocess

There are various subclasses of deep architecture Deepbelief networks (DBN) modal is a typical deep learningstructure which is first proposed by Hinton et al [11] Theoriginal DBN has demonstrated its success in simple imageclassification tasks of MNIST In [12] a modified DBN isdeveloped in which a Boltzmann machine is used on the toplayer This modified DBN is used in a 3D object recognitiontask

Deep convolutional neural network (DCNN) with theability to preserve the space structure and resistance to smallvariations in the images is used in image classification [13]Recently DCNN achieved the best performance compared toother state-of-the-art methods in the 2012 ImageNet LSVRCcontest containing 12 million images with more than 1000different classes In this DCNN application a very largearchitecture is built withmore than 600000 neurons and over60 million weights

DBN is a probabilistic model composed ofmultiple layersof stochastic hidden variables The learning procedure ofDBN can be divided into two stages generative learning toabstract information layer by layer with unlabelled samplesfirstly and then discriminative learning to fine-tune thewholedeep network with labeled samples to the ultimate learningtarget [11] Figure 1 shows a typical DBN with one input layer1198811and N hidden layers 119867

1 1198672 119867

119873 while x is the input

data which can be for example a vector and y is the learningtarget for example class labels In the unsupervised stage ofDBN training processes each pair of layers grouped togetherto reconstruct the input of the layer from the output InFigure 1 the layer-wise reconstruction happens between 119881

1

and 1198671 1198671and 119867

2 119867

119873minus1and 119867

119899 respectively which is

implemented by a family of restricted Boltzmann machines(RBMs) [14] After the greedy unsupervised learning of eachpair of layers the features are progressively combined fromloose low-level representations into more compact high-level representations In the supervised stage the wholedeep network is then refined using a contrastive versionof the ldquowake-sleeprdquo algorithm via a global gradient-basedoptimization strategy






119883 = [X1X2 X

119896 X

119870] (1)



119884 = [1199101 1199102 119910

119896 119910

119870] (2)



119910119896= (1 0) On the contrary 119910

119896= (0 1)



y1

ky2

k

La

HN

HNminus1

H1

V1

Xk

Visible layer

Feature input

Label layer

Hidden layer













LX119896= U119879X

119896V 119896 = 1 2 119870 (3)



119896with the




argmaxUV

119869 (UV) = sum119894119895

10038171003817100381710038171003817U119879 (X

119894minus X119895)V10038171003817100381710038171003817

2


119894119895)

(4)




119894119895


119861119894119895=

minus119899119899119888

(119899119899119888

+ 119899119888) 119899119888

if 119910119888119894= 119910119888119895= 1

1

119899119899119888

+ 119899119888

else

119882119894119895=

1

119899119888

if 119910119888119894= 119910119888119895= 1

0 else

(5)












119864 (k1 h1 1205791) = minus (k1Ah1 + b1k1 + c1h1)

= minus

119894le119868119895le119869

sum119894=1119895=1

119901le119875119902le119876

sum119901=1119902=1

V11198941198951198601

119894119895119901119902ℎ1

119901119902

minus

119894le119868119895le119869

sum119894=1119895=1

1198871

119894119895V1119894119895minus

119901le119875119902le119876

sum119901=1119902=1

1198881

119901119902ℎ1

119901119902

(6)


119894119895119901119902


1 1198871119894119895and 1198881

119901119902are the (119894 119895)



119875 (k1 h1 1205791) =1

119885119890minus119864(k1 h11205791)

=119890minus119864(k

1h11205791)


(7)


119875 (k1) =1

119885sum

ℎ1

119890minus119864(k1 h1 1205791)

=sumℎ1 119890minus119864(k1 h11205791)



119901 (h1 | k1) = prod119901119902

119901 (ℎ1

119901119902| k1) 119901 (ℎ

1

119901119902| k1)

= 120590(

119894le119868119895le119869

sum119894=1119895=1

V11198941198951198601

119894119895119901119902+ 1198881

119901119902)

(9)

119901 (k1 | h1) = prod119894119895

119901 (k1119894119895| h1) 119901 (k1

119894119895| h1)

= 120590(

119901le119875119902le119876

sum119901=1119902=1

ℎ1

1199011199021198601

119894119895119901119902+ 1198871

119894119895)

(10)




(a) (b)


1198601119894119895119901119902

(0) 1198871119894119895(0) and 1198881



1198601

119894119895119901119902= 1205991198601

119894119895119901119902

+ 120576119860(⟨V1119894119895(0) ℎ1

119894119895(0)⟩

dataminus ⟨V1119894119895(119905) ℎ1

119894119895(119905)⟩

recon)

1198871

119894119895= 1205991198871

119894119895+ 120576119887(V1119894119895(0) minus V1

119894119895(119905))

1198881

119901119902= 1205991198881

119901119902+ 120576119888(ℎ1

119901119902(0) minus ℎ

1

119901119902(119905))

(11)







119905y119905log y119905]


119905

in layer 119873















(a)


(b)






5 Conclusion






Acknowledgments


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in








119883 = [X1X2 X

119896 X

119870] (1)



119884 = [1199101 1199102 119910

119896 119910

119870] (2)



119910119896= (1 0) On the contrary 119910

119896= (0 1)



y1

ky2

k

La

HN

HNminus1

H1

V1

Xk

Visible layer

Feature input

Label layer

Hidden layer













LX119896= U119879X

119896V 119896 = 1 2 119870 (3)



119896with the




argmaxUV

119869 (UV) = sum119894119895

10038171003817100381710038171003817U119879 (X

119894minus X119895)V10038171003817100381710038171003817

2


119894119895)

(4)




119894119895


119861119894119895=

minus119899119899119888

(119899119899119888

+ 119899119888) 119899119888

if 119910119888119894= 119910119888119895= 1

1

119899119899119888

+ 119899119888

else

119882119894119895=

1

119899119888

if 119910119888119894= 119910119888119895= 1

0 else

(5)












119864 (k1 h1 1205791) = minus (k1Ah1 + b1k1 + c1h1)

= minus

119894le119868119895le119869

sum119894=1119895=1

119901le119875119902le119876

sum119901=1119902=1

V11198941198951198601

119894119895119901119902ℎ1

119901119902

minus

119894le119868119895le119869

sum119894=1119895=1

1198871

119894119895V1119894119895minus

119901le119875119902le119876

sum119901=1119902=1

1198881

119901119902ℎ1

119901119902

(6)


119894119895119901119902


1 1198871119894119895and 1198881

119901119902are the (119894 119895)



119875 (k1 h1 1205791) =1

119885119890minus119864(k1 h11205791)

=119890minus119864(k

1h11205791)


(7)


119875 (k1) =1

119885sum

ℎ1

119890minus119864(k1 h1 1205791)

=sumℎ1 119890minus119864(k1 h11205791)



119901 (h1 | k1) = prod119901119902

119901 (ℎ1

119901119902| k1) 119901 (ℎ

1

119901119902| k1)

= 120590(

119894le119868119895le119869

sum119894=1119895=1

V11198941198951198601

119894119895119901119902+ 1198881

119901119902)

(9)

119901 (k1 | h1) = prod119894119895

119901 (k1119894119895| h1) 119901 (k1

119894119895| h1)

= 120590(

119901le119875119902le119876

sum119901=1119902=1

ℎ1

1199011199021198601

119894119895119901119902+ 1198871

119894119895)

(10)




(a) (b)


1198601119894119895119901119902

(0) 1198871119894119895(0) and 1198881



1198601

119894119895119901119902= 1205991198601

119894119895119901119902

+ 120576119860(⟨V1119894119895(0) ℎ1

119894119895(0)⟩

dataminus ⟨V1119894119895(119905) ℎ1

119894119895(119905)⟩

recon)

1198871

119894119895= 1205991198871

119894119895+ 120576119887(V1119894119895(0) minus V1

119894119895(119905))

1198881

119901119902= 1205991198881

119901119902+ 120576119888(ℎ1

119901119902(0) minus ℎ

1

119901119902(119905))

(11)







119905y119905log y119905]


119905

in layer 119873















(a)


(b)






5 Conclusion






Acknowledgments


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in






LX119896= U119879X

119896V 119896 = 1 2 119870 (3)



119896with the




argmaxUV

119869 (UV) = sum119894119895

10038171003817100381710038171003817U119879 (X

119894minus X119895)V10038171003817100381710038171003817

2


119894119895)

(4)




119894119895


119861119894119895=

minus119899119899119888

(119899119899119888

+ 119899119888) 119899119888

if 119910119888119894= 119910119888119895= 1

1

119899119899119888

+ 119899119888

else

119882119894119895=

1

119899119888

if 119910119888119894= 119910119888119895= 1

0 else

(5)












119864 (k1 h1 1205791) = minus (k1Ah1 + b1k1 + c1h1)

= minus

119894le119868119895le119869

sum119894=1119895=1

119901le119875119902le119876

sum119901=1119902=1

V11198941198951198601

119894119895119901119902ℎ1

119901119902

minus

119894le119868119895le119869

sum119894=1119895=1

1198871

119894119895V1119894119895minus

119901le119875119902le119876

sum119901=1119902=1

1198881

119901119902ℎ1

119901119902

(6)


119894119895119901119902


1 1198871119894119895and 1198881

119901119902are the (119894 119895)



119875 (k1 h1 1205791) =1

119885119890minus119864(k1 h11205791)

=119890minus119864(k

1h11205791)


(7)


119875 (k1) =1

119885sum

ℎ1

119890minus119864(k1 h1 1205791)

=sumℎ1 119890minus119864(k1 h11205791)



119901 (h1 | k1) = prod119901119902

119901 (ℎ1

119901119902| k1) 119901 (ℎ

1

119901119902| k1)

= 120590(

119894le119868119895le119869

sum119894=1119895=1

V11198941198951198601

119894119895119901119902+ 1198881

119901119902)

(9)

119901 (k1 | h1) = prod119894119895

119901 (k1119894119895| h1) 119901 (k1

119894119895| h1)

= 120590(

119901le119875119902le119876

sum119901=1119902=1

ℎ1

1199011199021198601

119894119895119901119902+ 1198871

119894119895)

(10)




(a) (b)


1198601119894119895119901119902

(0) 1198871119894119895(0) and 1198881



1198601

119894119895119901119902= 1205991198601

119894119895119901119902

+ 120576119860(⟨V1119894119895(0) ℎ1

119894119895(0)⟩

dataminus ⟨V1119894119895(119905) ℎ1

119894119895(119905)⟩

recon)

1198871

119894119895= 1205991198871

119894119895+ 120576119887(V1119894119895(0) minus V1

119894119895(119905))

1198881

119901119902= 1205991198881

119901119902+ 120576119888(ℎ1

119901119902(0) minus ℎ

1

119901119902(119905))

(11)







119905y119905log y119905]


119905

in layer 119873















(a)


(b)






5 Conclusion






Acknowledgments


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




(a) (b)


1198601119894119895119901119902

(0) 1198871119894119895(0) and 1198881



1198601

119894119895119901119902= 1205991198601

119894119895119901119902

+ 120576119860(⟨V1119894119895(0) ℎ1

119894119895(0)⟩

dataminus ⟨V1119894119895(119905) ℎ1

119894119895(119905)⟩

recon)

1198871

119894119895= 1205991198871

119894119895+ 120576119887(V1119894119895(0) minus V1

119894119895(119905))

1198881

119901119902= 1205991198881

119901119902+ 120576119888(ℎ1

119901119902(0) minus ℎ

1

119901119902(119905))

(11)







119905y119905log y119905]


119905

in layer 119873















(a)


(b)






5 Conclusion






Acknowledgments


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





(a)


(b)






5 Conclusion






Acknowledgments


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



research article a vehicle detection algorithm based on...

Documents