real-time vehicle type classification with deep...
TRANSCRIPT
SPECIAL ISSUE PAPER
Real-time vehicle type classification with deep convolutionalneural networks
Xinchen Wang1 • Weiwei Zhang1 • Xuncheng Wu1 • Lingyun Xiao2 •
Yubin Qian1 • Zhi Fang1
Received: 6 January 2017 / Accepted: 6 August 2017 / Published online: 22 August 2017
� Springer-Verlag GmbH Germany 2017
Abstract Vehicle type classification technology plays an
important role in the intelligent transport systems nowa-
days. With the development of image processing, pattern
recognition and deep learning, vehicle type classification
technology based on deep learning has raised increasing
concern. In the last few years, convolutional neural net-
work, especially Faster Region-convolutional neural net-
works (Faster R-CNN) has shown great advantages in
image classification and object detection. It has superiority
to traditional machine learning methods by a large margin.
In this paper, a vehicle type classification system based on
deep learning is proposed. The system uses Faster R-CNN
to solve the task. Experimental results show that the
method is not only time-saving, but also has more robust-
ness and higher accuracy. Aimed at cars and trucks, it
reached 90.65 and 90.51% accuracy. At last, we test the
system on an NVDIA Jetson TK1 board with 192 CUDA
cores that is envisioned to be forerunner computational
brain for computer vision, robotics and self-driving cars.
Experimental results show that it costs around 0.354 s to
detect an image and keeps high accurate rate with the
network embedded on NVDIA Jetson TK1.
Keywords Convolutional neural network � Vehicle type
classification � Deep learning � Intelligent transportationsystem � Object detection
1 Introduction
The intelligent transportation system cannot only use the
existence transportation facilities effectively, but also can
lessen the environment pollution, keep the traffic safety and
improve the conveying efficiency. The intelligent trans-
portation system includes three parts, intelligent vehicles,
intelligent highway systems and intelligent drivers. The
research on vehicle type classification has significant value
on the development of the intelligent transportation system
and intelligent automobiles.
The application of vehicle type classification is quite
crucial in daily life, such as intelligent monitoring system,
auto-charging system in the highway and illegal preemp-
tion of way detection. In the earlier times, the sensors laid
out under the roads were the main method of vehicle type
classification. The data were collected and analyzed from
the sensors to get the information of related vehicles. With
the development of computer vision technology, the
detection of vehicle types through the method of image
processing and pattern recognition has been widely used
[1–7]. The vehicle classification system based on machine
vision can embedded with current traffic cameras. It has
many advantages, such as convenient installation, easy
maintainability and small areas occupation. And the data
gotten from the system can be used to research and process
for other purposes. With the rapid advancement of graphics
processing unit (GPU), the calculation ability of processing
image has been greatly enhanced, which also in turn
brought the fast advancement of deep learning. Compared
with traditional feature extraction algorithm, deep learning
has better adaptability and universal applicability. In recent
years, the technology of deep learning has been success-
fully applied to segmentation, detection and recognition of
images and videos, such as face recognition and pedestrian
& Weiwei Zhang
1 College of Automotive Engineering, Shanghai University of
Engineering Science, Shanghai, China
2 China National Institution of Standardization, Beijing, China
123
J Real-Time Image Proc (2019) 16:5–14
https://doi.org/10.1007/s11554-017-0712-5
detection [8, 9]. During last decades, the method of vehicle
type classification based on machine vision is mainly
adopted traditional image processing method for vehicle
location and recognition, such as Histograms of Oriented
Gradient combined with Support Vector Machine
(HOG ? SVM). The vehicle type classification technology
based on deep learning has become more and more popular
among many researchers recently. If the deep learning can
be successfully applied in vehicle location and recognition
in natural scene, it will show great value to construct the
Intelligent Traffic System and driverless system.
This paper has applied advanced deep learning library
Caffe and GPU accelerating technology with powerful
computation ability. Advanced Faster R-CNN object
detection framework is also used in this paper. As a
backbone network, the ZF net is a deep convolution neural
network with five sharing convolutional layers.
The rest of this paper is organized as follows: In Sect. 2,
it introduces the related work of the research from two
aspects, one is the background of vehicle type classifica-
tion, and the other is the development of convolution
neural network. In Sect. 3, the revised algorithm frame-
work of vehicle type classification is proposed. The struc-
ture of vehicle type classification based on Faster R-CNN
and its application are discussed in this part. In Sect. 4, the
proposed method is evaluated and analyzed by the exper-
iment. Section 5 draws the conclusion of this paper.
2 Related work
2.1 Background vehicle type classification
During the last decade, various vehicle type classification
methods have been proposed. They have been successfully
applied to the fields of transportation and military. Sarfraz
put forward local characteristics of shape histogram based
on vehicle’s frontal area. The characteristic was later
classified by the Bayes prior model [10]. Ramnath
extracted the 3D space curve of automobiles and classified
the automobiles from the appearance [11], which also hold
the disadvantage of huge amount of calculation. The
method can classify vehicles from images took by any
angle. The disadvantage is huge amount of calculation.
Alonso adopted the method of multi-dimensional classifi-
cation to realize the vehicle detection on the traffic road
[12]. Chang and Cho proposed a novel method based on
online boosting to detect vehicle. It solved the difficult
question of vehicle detection in different scenes [13].
Zhang presented a vehicle detection method based on deep
convolution neural network. It was a solution to finely
recognize the vehicle in the natural scene [14, 15].
2.2 Background on convolution neural network
Convolutional neural network (CNN) is a feed-forward
neural network, which is inspired by the cognitive mech-
anism of biological natural vision. In 1959, Hubel and
Wiesel studied neurons used to local sensitivity and
direction selection in cortex of cats. They discovered the
unique network structure can effectively reduce the com-
plexity of the feedback neural network. Fukushima pro-
posed the Neocognitron, which is the predecessor of CNN
in 1980s [16]. In the 1990s, LeCun et al. [17] established
the modern structure of CNN. They designed an artificial
neural network with multilayer and named LeNet-5. It
realized the classification of handwritten numbers. This
model was applied to read the numbers on the checks in
America. With the development of big data and GPU
acceleration technology, Krizhevsky et al. [18] proposed a
classic CNN structure AlexNet and won the ILSVRC 2012
champion.
In recent years, due to the success of region proposal
technology, the object detection has developed rapidly.
Object detection systems have sprung out, like R-CNN,
SPP-net [19] and Fast R-CNN. However, the computing
time of region proposal has limited the development of
detection systems. In 2015, Ren et al. proposed a new
object detection algorithm framework Faster R-CNN based
on Fast R-CNN, which realized region proposals through
the use of the region proposal network (RPN). By sharing
convolutional layer parameters, RPN realizes region pro-
posals. And it takes only 10 ms of each picture. Faster
R-CNN can be simply seen as a detection system which
combines RPN and Fast R-CNN algorithm framework [20].
It uses RPN to replace selective search (SS) in Fast
R-CNN. The speed of object detection with simple network
(The ZF net [21]) can reach 17 FPS, and the accuracy rate
is 59.95% on the PASCAL VOC benchmark, while the
speed with complex network (VGG16 Net) can reach 5
FPS, and the accuracy rate is 78.8% on the PASCAL VOC
Benchmark [22].
3 Network architecture
There are differences in the installation angle of the camera
at each traffic intersection, so the photos taken will also
have differential angle problems. In order to solve this
problem, we adopt a new method for data augmentation.
Synthetically create new training examples by applying
some transformations on the input data. We adopt a method
which combined picture flip with picture crop. As shown in
Fig. 1, a picture is extended to 10 pictures by using flipped
and cropped operation.
6 J Real-Time Image Proc (2019) 16:5–14
123
The Faster R-CNN adopted in this paper is an advanced
object detection method. The specific details of vehicle
type classification algorithm structure are shown in Fig. 2.
It includes two elements, one is the region proposal net-
work (RPN) and the other is the detection network with
five shared convolutional layers.
3.1 Region proposal network for vehicle location
This paper proposes a better region proposal algorithm,
which is region proposal network (RPN). The RPN shares
the convolutional layer parameters with object detection
network. As a result, the computation time of region pro-
posal has been reduced. Faster R-CNN is developed from
Fast R-CNN object detection system. It replaces the
selected search (SS) by the region proposal network.
Selective search (SS) is a typical region proposal
technology. It costs 2 s in average to detect an image on
CPU. EageBoxes method only costs 0.2 s in average [22].
Although the time has greatly decreased, it still spends lots
of computing time. Different from the method of image
pyramids and filter pyramids, RPN represents the region
proposals of multi-scale and aspect ratio by anchor boxes.
As shown in Fig. 2, the proposed method builds the
RPN at the top of the last shareable convolutional layer
(layer 5). By using a small network sliding on the feature
map generated by the last shareable convolution layer, the
feature of each sliding window is mapped to 256 dimen-
sions (for the ZF net). After ReLU nonlinear processing,
the feature would be fed to the two fully connection layers,
a bounding box-regression layer (reg layer) and a box-
classification layer (cls layer). The reg layer is used to
predict the 4 k coordinates in k proposals. The cls layer is
used to output 2 k scores which are probabilities of objects
included in k proposals. The k proposals are parameterized
related to k anchor boxes. Aimed at the practical problems
of vehicle type classification, the proposed method uses a
3 9 3 convolutional layer and two 1 9 1 convolutional
layers (corresponding to reg and cls layer, respectively).
The RPN cannot only be used to predict the position of the
vehicle, but also can output the score of two categories of
each proposal.
According to the actual aspect ratio of the vehicle
frontal-view images, each sliding window uses three scales
and two aspect ratios in this paper. The aspect ratios of
anchor boxes are 0.9 and 0.6. In addition, each sliding
window uses three scales with areas of 1002, 1602 and 4102
pixels, so it has 6 (k = 6) anchor boxes. The detailed sizes
of the anchor are shown in Table 1.
Fig. 1 Method of data augmentation
Fig. 2 Vehicle type classification algorithm structure
J Real-Time Image Proc (2019) 16:5–14 7
123
3.2 Convolutional neural network for vehicle type
classification
In the proposed method, the detection network can be
realized by Fast R-CNN detection method. For shareable
convolutional layers of RPN and detection network, the
improved ZF net is applied on the PASCAL VOC2012 as
the backbone network. We have improved the ZF net and
add two new convolution layers and a new max pooling
layer on the basis of the original network. It has a total of 7
shareable convolutional layers. It can improve expression
ability of the network by increasing the depth of the net-
work. Detailed network structure is shown in Fig. 3.
As shown in Fig. 3, the structure of backbone network
uses 96 convolution kernels of 5 9 5 in the first layer and
256 convolution kernels of 5 9 5 in the second layer. The
convolutional stride is 2, so it can get more information in
the first and second convolutional layer. In the third, fourth
and seventh layer, it uses 384 convolutional kernels of
3 9 3, and the convolutional stride is 1. In the fifth and
sixth layer, it uses 256 convolutional kernels of 3 9 3, and
the convolutional stride is 1. In the first layer, second layer
and fifth layer, it uses the max pooling whose sliding
window size is 3 9 3 and the stride is 2. As a result, it
reduces data dimension and computation time and avoids
network over fitting effectively. The detailed parameters of
convolutional layers are shown in Table 2.
The specific details of training are shown in Fig. 2. The
feature map generated by the last convolutional layer is
used as RPN and ROI pooling layer input. The feature
maps will generate some high-quality proposal regions by
using RPN and then feed to the ROI pooling layer to train
the detection network and RPN. In the end, the trained
network can detect the vehicle frontal-view images in a
large scale and aspect ratio. Each region proposal outputs a
kind of class label and a softmax score between 0 and 1.
Each image is trained with a multitasking loss function
according to formula (1).
LðfPig; ftigÞ ¼1
Ncls
X
i
LclsðPi;P�i Þ þ k
1
Nreg
X
i
Lregðti; t�i Þ
ð1Þ
where i represents the number of anchor, Pi indicates that
the anchor is object probability. If anchor is a positive
sample, P�i is 1, or P�
i is 0. ti is 4 coordinates of the pre-
dicted bounding box. t�i is label associated with a positive
anchor. Ncls and Nreg are two normalization parameters. k isa balancing parameter, and it is set to 10.
Formula (2) is used to describe bounding box-regression
loss.
Lreg ti; t�i
� �¼ smoothL1 ti � t�i
� �ð2Þ
where smoothL1 is robust regression loss function, as
shown in formula (3).
smoothL1ðxÞ ¼0:5x2 xj j\1
xj j � 0:5 xj j � 1
�ð3Þ
4 Experiment and results
Vehicle type classification system was evaluated on the
dataset setup. The experiment was run on Intel Xeon CPU
E5-2630 v3 running at 2.40 GHz, 64 GB RAM and a NVI-
DIA GTX 1080 GPU on an Ubuntu 14.04 64 bit system.
Table 1 Sizes of the anchorK 1 2 3 4 5 6
Anchor sizes 1002, 0.9 1002, 0.6 1602, 0.9 1602, 0.6 4102, 0.9 4102, 0.6
Fig. 3 Structure of the backbone network
8 J Real-Time Image Proc (2019) 16:5–14
123
4.1 Datasets
The original data in the dataset were collected from the real
images taken at the crossroads. The standard label of
PASCAL VOC was adopted. The standard dataset for
studying the vehicle location and recognition was set up.
According to the actual situation of vehicle appearance at
the crossroads, there are four major types of vehicles, such
as cars, minivans, trucks and buses. The constructed dataset
contains more than 60,000 labeled pictures. These pictures
are different of scales, illumination and angle.
The total number of selected sample pictures in the
training is 37,578. There are 15,000 images of cars, 13,698
images of trucks, 4805 images of minivans and 4075
images of buses.
4.2 Training of RPN and detection network
RPN is trained by using Stochastic Gradient Descent
(SGD). By using the zero-mean Gaussian distribution with
standard deviation 0.01, the method randomly initializes
new layers of RPN. The other layers are initialized by pre-
Table 2 Parameters of
convolutional layersLayers Input Layer 1 Layer 2 Layer 3 Layer 4 Layer 5 Layer 6 Layer 7
Names Original image Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 Conv 6 Conv 5
The improved ZF net
Kernel sizes – 5 9 5 5 9 5 3 9 3 3 9 3 3 9 3 3 9 3 3 9 3
Strides – 2 2 1 1 1 1 1
Channels 3 96 256 384 384 256 256 384
Fig. 4 Precision–recall curves of shared and unshared convolutional layers
Table 3 Classification results
on the test setMethods Classes Average precision (AP) (%) MAP (%)
RPN ? ZF, shared Car 90.6560 81.0553
Bus 66.3634
Minivan 76.6880
Truck 90.5138
RPN ? ZF, unshared Car 90.6096 78.8245
Bus 60.3027
Minivan 74.0719
Truck 90.3138
J Real-Time Image Proc (2019) 16:5–14 9
123
Fig. 5 Selected examples of vehicle detection results on the test set using the proposed method
10 J Real-Time Image Proc (2019) 16:5–14
123
trained model on PASCAL VOC2012 Benchmark. For the
detection network, the method adjusts all layer parameters.
The algorithm implementation adopts Caffe which is an
advanced deep learning framework and uses a momentum
of 0.9, a weight decay of 0.0005 and a mini-batch size of
256. Each mini-batch extracts multiple positive and nega-
tive samples anchors from each picture. In order to elimi-
nate redundant region proposals, the method uses non-
maximum suppression (NMS) to reduce the number of
region proposals according to the scores generated by cls
layer. The threshold for NMS is set to 0.7. The method uses
2000 proposal regions in the training stage, while the
number of proposal regions is no more than 300 in the test
stage. After the non-maximum suppression, the highest
score region proposal is selected to detect objects.
If RPN and detection network are separately trained, the
parameters of convolutional layer are changed in different
ways. So we adopt a method that can make two networks
share convolution layer for training. In this paper, the four-
step alternating training method is adopted in the training
part [23]. This method first trains the RPN and then uses
the proposals generated by previous step to train the
detection network (Fast R-CNN). The detection network
obtained by this step will be used to initialize parameters
Fig. 6 Incomplete vehicles shown in the pictures of detection results on the test set using the proposed method
Fig. 7 SS and RPN computation time distribution curves
Fig. 8 Vehicle type classification hardware equipment based on
NVDIA Jetson TK1
J Real-Time Image Proc (2019) 16:5–14 11
123
for training RPN in the next step. The process is gradually
iterated.
4.3 Results analysis of training and test
In the experiment, there are 42,578 pictures in total to be
trained and tested. Among the pictures, there are 37,578
pictures used for training, 5000 pictures used for test. The
number of iteration is 100,000 in total. Through the GPU
acceleration technology, the network training has been fin-
ished in 10 h. The test result is shown as follows. The pre-
cision–recall curves of shared and unshared convolutional
layers on test dataset are shown in Fig. 4a, b, respectively.
From the figures, we recognized that it has better effect when
sharing convolutional layers with RPN and detection net-
work, themAP is 81.0553%.Themethod has better detection
average precision toward cars and trucks, while the average
precision ofminivans and buses is lower. The result might be
caused by little training set of minivans and buses. The
detailed detection precision is shown in Table 3.
The selected examples of vehicle detection results on
the test set using the proposed method are shown in
Fig. 5a–h. These bounding boxes are the closest object
region proposals to the ground-truth box per image. An
NMS threshold of 0.7 was used to determine correctness,
and each output bounding box is associated with a category
label and a softmax score in [0, 1]. Here, the output box
which has 0.7 and more softmax score is shown. From the
results in Fig. 5a–h, the proposed method has accurate
detection results no matter when day and night is.
In terms of incomplete vehicles shown in the pictures,
the method still has high accuracy to detect them, which
shown in Fig. 6a, b. This proved the advantage of the
proposed method. It can keep stable in translation, zoom
and deformation of image processing.
The method can meet the precision demand. Mean-
while, the detection time of the method is needed to
consider whether to meet the detection requirement of
end-to-end. The detection time is compared using dif-
ferent region proposal methods. The test result is shown
in Fig. 7. Figure 7a, b are the computation time distri-
bution curves of using SS and RPN, respectively. From
the test results, the average detection time of an image is
2.124 s with SS, while the average detection time of an
image is only 0.123 s with RPN. As a result, RPN can
greatly reduce the detection time of an image. The
proposed method can meet the requirements of real-time
detection in engineering. From the specific results of
RPN test, the average time of normalization, convolu-
tional layer and region proposal is 0.101 s, and the
average time of NMS and region detection is 0.022 s. As
a result, RPN reduces great amount of computation time
of region proposal.
4.4 Realization based on NVDIA Jetson TK1
After training, the network based on Caffe is mounted to
NVDIA Jetson TK1. Then it can be operated. The vehicle
type classification hardware equipment based on NVDIA
Jetson TK1 used is shown in Fig. 8. After implementing
the program needed by environment and cameras, the
system is used to detect the vehicle position and type on the
crossroad. The selected examples of vehicle detection
results are as shown in Fig. 9.
4.5 Detection results based on NVDIA Jetson Tk1
Figure 9a, b are the selected examples of vehicle detection
results detected on the crossroad. The experiment results
indicate that under the complex environment, even if there
Fig. 9 Selected examples of vehicle
12 J Real-Time Image Proc (2019) 16:5–14
123
are several cars and trucks appeared at the same time in the
picture, the proposed method can still realize accurate
classification. Besides, it costs average time around 0.354 s
to process an image in the system. According to the actual
situation of traffic flow, the passage time per vehicle is
about 0.89 s, so it meets the requirement of real-time
classification.
5 Conclusion
This paper proposes a vehicle type classification method
based on convolutional neural networks (CNN). The
proposed method has high accuracy. Aimed at cars and
trucks, it has over 90% accuracy. The method can realize
real-time classification on the test of NVDIA develop-
ment board. It costs around 0.354 s to detect each image
with the network embedded on NVDIA Jetson TK1 and
keeps high accurate rate. In future work, the scope of
bus and minivan training datasets will be enlarged in
order to enhance the detection precision. The ability of
detection the vehicles occluded from each other will be
improved as well.
Acknowledgements This work was supported in part by National
Fund for Fundamental Research (No. 282017Y-5303), in part by the
Fund of National Automobile Accident In-depth Investigation System
(No. HT2016X-007), in part by National Natural Science Foundation
of China (No. 51675324), in part by Training and funding Program of
Shanghai College young teachers (No. ZZGCD15102), in part by
Scientific Research Project of Shanghai University of Engineering
Science (No. 2016-19) and in part by the Shanghai University of
Engineering Science Innovation Fund for Graduate Students (No.
16KY0602).
References
1. Hsieh, J.W., Chen, L.C., Chen, D.Y. et al.: Vehicle make and
model recognition using symmetrical SURF. In: 2013 10th IEEE
International Conference on Advanced Video and Signal Based
Surveillance (AVSS), pp. 472–477 (2013)
2. Dong, Z., Wu, Y., Pei, M. et al.: Vehicle type classification using
a semisupervised convolutional neural network. In: IEEE
Transactions on Intelligent Transportation Systems,
pp. 2247–2256 (2015)
3. Lai, A.H. Fung, G.S., Yung, N.H.: Vehicle type classification
from visual-based dimension estimation. In: Proceedings of the
IEEE Intelligent Transportation Systems Conference,
pp. 201–206 (2001)
4. Gupte, S., Masoud, O., Martin, R.F., et al.: Detection and clas-
sification of vehicles. IEEE Trans. Intell. Transp. Syst. 3(1),37–47 (2002)
5. Saravi, S., Edirisinghe, E.A.: Vehicle make and model recogni-
tion in CCTV footage. In: 2013 18th International Conference on
Digital Signal Processing (DSP), pp. 1–6 (2013)
6. Foresti, G.L., Murino, V., Regazzoni, C.: Vehicle recognition and
tracking from road image sequences. IEEE Trans. Veh. Technol.
48(1), 301–318 (1999)
7. Jang, D.M., Turk, M.: Car-rec: a real time car recognition system.
In: 2011 IEEE Workshop on Applications of Computer Vision
(WACV), Kona, HI, USA, pp. 599–605 (2011)
8. Tong, B., Fan, B., Wu, F.: Convolutional neural networks with
neural cascade classifier for pedestrian detection. In: Chinese
Conference on Pattern Recognition 2016, pp. 243–257. Springer
Nature Singapore Pte Ltd
9. Tome, D., Monti, F., Baroffo, L., et al.: Deep convolution neural
networks for pedestrian detection. Signal Process. Image Com-
mun. 47, 482–489 (2016)
10. Sarfraz, S.M., Saeed, A., Khan, M.H. et al. Bayesian prior models
for vehicle make and model recognition. In: Proceedings of the
7th International Conference on Frontiers of Information Tech-
nology, pp. 35:1–35:6. ACM, New York (2009)
11. Ramnath, K., Hsiao, E. et al.: Car make and model recognition
using 3D curve alignment. In: Winter Conference on Applica-
tions of Computer Vision, pp. 285–292 (2014)
12. Alonso, D., Salgado, L. et al.: Robust vehicle detection through
multidimensional classification for on board video based systems.
In: IEEE International Conference on Image Processing, pp. 4:
IV–321–IV–324 (2007)
13. Chang, W.C., Cho, C.W.: Online boosting for vehicle detection.
IEEE Trans. Syst. Man Cybernet. Part B (Cybernetics) 40(3),892–902 (2010)
14. Zhang, F.: Car Detection and Vehicle Type Classification Based
on Deep Learning. Jiangsu University, Jiangsu (2016)
15. Zhang, F., Xu, X, Qiao, Y.: Deep classification of vehicle makers
and models: the effectiveness of pre-training and data enhance-
ment. In: 2015 IEEE International Conference on Robotics and
Biomimetics (ROBIO), pp. 231–236 (2015)
16. Gu, J., Wang, Z., Kuen, J. et al.: Recent Advances in Convolu-
tional Neural Networks. arXiv preprint arXiv:1512.07108[cs.CV]
(2016)
17. LeCun, Y., Boser, B., Denker, J.S., et al.: Backpropagation
applied to handwritten zip code recognition. Neural Comput.
1(4), 541–551 (1989)
18. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification
with deep convolutional neural networks. In: Neural Information
Processing Systems (NIPS), pp. 1097–1105 (2012)
19. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in
deep convolutional networks for visual recognition. In: European
Conference on Computer Vision (ECCV) (2014)
20. Girshick, R: Fast R-CNN. In: Proceedings of IEEE International
Conference on Computer Vision (ICCV) (2015)
21. Zeiler, M.D., Fergus, R.: Visualizing and understanding convo-
lutional networks. In: Proceedings of Computer vision-ECCV
2014. Springer, pp 818–833 (2014)
22. Ren, S., He, K., Girshick, R. et al.: Faster R-CNN: towards real-
time object detection with region proposal networks. In: Pro-
ceedings of Advances in Neural Information Processing Systems,
pp. 91–99 (2015)
23. Ren, S.: Efficient Object Detection with Feature Sharing.
University of Science and Technology of China, Hefei (2016)
Xinchen Wang is a postgraduate student in Shanghai University of
Engineering Science, Shanghai, China. His research direction focuses
on the technology of intelligent vehicle. His current research interests
include the technology of image processing and deep learning
technology.
Weiwei Zhang received Ph.D. degree in Mechanical Engineering in
Hunan University in 2015. Now he is a lecturer in Shanghai
University of Engineering Science. His research direction is the
technology of intelligent vehicle. His current research interests
include the technology of image processing, intelligent vehicle and
J Real-Time Image Proc (2019) 16:5–14 13
123
power train of vehicle. Now his team undertakes several major
projects from renowned Chinese companies.
Xuncheng Wu received Ph.D. degree in Mechanical Engineering in
Xi’an Jiaotong University in the year of 2000. Now he is a professor
in Shanghai University of Engineering Science. His team has
designed three new transmissions for Shanghai Automobile Gear
Works, China, since 2010. His current research interests include the
nonlinear dynamics of gear system, electric control shifting of AMT
and the technology of intelligent vehicle.
14 J Real-Time Image Proc (2019) 16:5–14
123